Research Area: | Peer-to-peer computing | ||
---|---|---|---|
Status: | Finished | Degree: | Phd |
Directors: |
|
Students: | |
Proposed start date: | 2004-09-12 | Proposed end date: | 2009-01-12 |
Digital version | |||
Attachements: | |||
Description: | |||
A peer-to-peer (P2P) overlay network is a logical network, built on top of an underlying physical network, which facilitate the location of distributed resources without centralized control. These systems have emerged on the edges of the Internet thanks to the generalized growth of broadband Internet connections.
There exist two main families of P2P systems: unstructured and structured. In the former, the distribution of resources across the network and the overlay structure are independent. They are more easily set up, but resource location requires exhaustive search over the network. In the latter, the distribution of resources and the network structure are strongly coupled and hence, searches for resources are more efficient. However, their construction and maintenance are more complex than in unstructured overlay networks. On the contrary, structured overlays, also called Distributed Hash Tables (DHTs), overcome the scaling problems of unstructured systems. Essentially, DHTs provide the same functionality of a hash table—the standard Put(key, value) and Get(key) interface, but associating keyvalue pairs with users rather than hash buckets. Due to its salient scalability, DHT abstraction has received a lot of attention in the last years. However, many of the existing proposals have been conceived under the assumption that logical proximity in the overlay does not need tomatch physical proximity in the Internet. Thismismatch has become a large barrier for the deployment and performance optimization of P2P applications. In addition, many of the existing designs have assumed that communication is uniform, while, in practice, users communicatemore frequentlywith other users in the same administrative domain. The traditional approach to tackle these flaws revolves around the organization of users into a hierarchy of domains. By partitioning the network into smaller domains, complexity reduces too. As typical examples, DNS or group multicast achieve scalability via a hierarchical design. The basic problem is that most of the existing DHTs have been devised as flat structures and hence, they cannot enjoy the scalability of a hierarchical design. In this dissertation, we attempt to address this issue from three different perspectives: Seduced by the scalability of hierarchical designs, we start by presenting a hierarchical framework for DHTs. Themain aimof our framework is to provide a genericmethodology to transform a flat DHT into a hierarchical one made up of telescoping clusters — clusters of ... of clusters of peers. Our idea is to exploit the recursive structure that many DHTs have to embed a hierarchy of domains. This strategy has an important benefit: our hierarchical construction of a DHT inherits the homogeneity of load and functionality of the original design but with the advantages of having a hierarchical structure. Further, we provide the hierarchical version of Chord and give some hints to transform another six DHTs. With our hierarchical construction of Chord, we investigate the potential reduction in query latency offered by our framework. Then, we proceed to study an interesting question: what makes our hierarchical constructions better than existing hierarchical DHTs? To address this issue, we introduce an analytic costmodel. Our idea is to provide the means for identifying the optimal hierarchical design in terms of communication cost. In an effort to show the lack of a universally optimal design, we analyze the two main families of hierarchical designs: the superpeer design and the homogeneous design. Our hierarchical constructions belong to the last family. In general, our hierarchical designs have many interesting applications. For example, the introduction of multiple separate domains in a flat structure can be used to improve its performance. One example is latency optimization. By adapting the clusters to the physical network, the search latency could be diminish within each cluster and obtain important savings. The problem is how to divide peers into low-latency clusters in a decentralized and accurate manner. To answer this question, we finally introduce a clustering algorithm that attempts to organize peers into clusters, so that peers in a cluster are closer — in terms of round-trip-time — to each other than peers not in their cluster. To assess the quality of our clusterings, we propose a novel metric called false clustering rate. This metric measures the proportion of falsely clustered peers in a clustering. Falsely clustered peers are distant nodes that have been clustered together. Using ourmetric,we provide enough evidence that our algorithmcan achieve important improvements.
|