Summary: | 碩士 === 國立交通大學 === 資訊科學與工程研究所 === 101 === This thesis presents an empirical study of the effect of “churn” on the real-world unstructured Peer-to-Peer (P2P) applications. Over the past decade, P2P applications have grown radically on the Internet. In P2P networks, peers (also referred to as servents) may connect with different neighbor peers over time due to the dynamic joining and leaving of peers. Such dynamic interconnection phenomenon is referred to as “Churn” and may seriously affect the performance of P2P applications. The effect of Churn on the performance of structured P2P networks has already been well-studied. However, little research has been conducted on the effect of churn on unstructured P2P networks because of the operational complexity rooted from the unstructured topologies. Therefore, this thesis turns its focus onto the investigation of the effect of Churn on unstructured P2P networks.
In order to analyze the correlation between Churn and the performance of a P2P network, we need to collect a dataset of the P2P network topology snapshots and the associated performance metrics. Dataset collection has three requirements, namely large number of representative peers, short collection time, and little interference to the P2P networks, keeping in mind that these three requirements may contravene with one another at times. In general, millions of peers exist on a P2P network and each snapshot must contain sufficient number of representative peers. However, because P2P topologies change over time, the collection time of each snapshot must be short enough for the snapshots to be accurate. Therefore, we propose a Third-party-to-servent Crawling with Servent-to-servent Sampling (TCSS) system to collect the datasets for analyzing the effects of Churn on the performance of a P2P network.
TCSS utilizes a third-party crawling technique to collect a P2P network topology without disturbing the original P2P network topology under investigation. Furthermore, TCSS adopts a distributed and parallel crawling system to speed-up the crawling process. The crawling system consists of one central repository, one dispatching server and eleven crawling clients. The central repository stores all peers discovered by the crawling clients. The dispatching server is responsible for dispatching new peer crawling jobs to balance the loads of the crawling clients. The crawling clients perform the crawling work by using asynchronous I/O to increase parallelism. Besides, each crawling client forks five worker processes to determine and store the newly discovered peers in the central repository. The crawling clients also adopt cache technique for workers to speed-up the new peer discovery processes. In addition, TCSS also employs a Servent-to-servent Sampling technique, in which many customized lightweight servents contact with the representative servents selected from the target P2P network, to gather the corresponding performance metrics of the P2P network simultaneously. Servent-to-servent Sampling adopts Metropolized Random Walk with Backtracking (MRWB) to select the representative servents from the mass population of peers and significantly reduce the number of servents for the customized servent to contact with.
With the highly parallel crawling system and Servent-to-servent Sampling technique, TCSS surpasses the limitation in current crawling systems for unstructured P2P networks, where no performance metrics are included in the corresponding topology snapshots. Empirical results show that TCSS can capture an accurate topology snapshot of the P2P network in a short period (around 7 minutes per snapshot).
We have conducted a broad measurement on our target P2P network, Gnutella, continuously for 20 hours and collected a dataset of 163 snapshots. Each snapshot contains a topology of more than one million of peers and corresponding performance metrics. We used a peer elimination policy to eliminate peers with inaccurate information and then investigated thoroughly the collected dataset and discovered three significant findings. First, Churn is indeed the combined effects of peer arrivals/departures and neighbor selection algorithm. From the dataset, we can observe that peers may change their neighbors even if their previous neighbors still exist in the P2P network. Second, as the number of peers increases, the number of very long-life peers nearly remains constant and the P2P network possesses a small world property (the shortest paths of any two peers are mostly four or five hops). Third, as churn aggravates, the booting time of each peer does not certainly decline. However, the booting times of peers indeed decline on the average but the variation increases as Churn aggravates. As for the response time of keyword searches, the top rank keyword searches are nearly not affected by the degree of churn. However, the response times of lower rank keywords increases significantly as Churn becomes aggravated.
In summary, this thesis presented the design and implementation of a framework that can be used to collect dataset of the unstructured P2P networks. Furthermore, it also provides thorough empirical results of the effects of Churn on unstructured P2P networks that are not disclosed previously.
|