P2P Characterization

Unbiased Sampling

Unbiased Sampling for Unstructured Peer-to-Peer Networks

This study presents the Metropolized Random Walk with Backtracking (MRWB) as a viable and promising technique for collecting nearly unbiased samples in peer-to-peer systems.

This study addresses the difficult problem of selecting repre- sentative samples of peer properties (e.g., degree, link bandwidth, number of files shared) in unstructured peer-to-peer systems. Due to the large size and dynamic nature of these systems, measuring the quantities of interest on every peer is often prohibitively expensive, while sampling provides a natural means for estimating system-wide behavior efficiently. However, commonly-used sampling techniques for measur- ing peer-to-peer systems tend to introduce considerable bias for two reasons. First, the dynamic nature of peers can bias results towards short-lived peers, much as naively sampling flows in a router can lead to bias towards short-lived flows. Second, the heterogeneous nature of the overlay topology can lead to bias towards high-degree peers. We present a detailed examination of the ways that the behavior of peer-to-peer systems can introduce bias and suggest the Metropolized Random Walk with Backtracking (MRWB) as a viable and promising technique for collecting nearly unbiased samples. We conduct an extensive simulation study to demonstrate that the proposed technique works well for a wide variety of common peer-to-peer net- work conditions. Using the Gnutella network, we empiri- cally show that our implementation of the MRWB technique yields more accurate samples than relying on commonly- used sampling techniques. Furthermore, we provide insights into the causes of the observed differences. The tool we have developed, ion-sampler, selects peer addresses uniformly at random using the MRWB technique. These addresses may then be used as input to another measurement tool to collect data on a particular property.

Publications

  • On Unbiased Sampling for Unstructured Peer-to-Peer Networks
    Daniel Stutzbach, Reza Rejaie, Nick Duffield, Subhabrata Sen, Walter Willinger
    IEEE/ACM Transactions on Networking, Volume 17, Number 2, April 2009
    [PAPER]

  • On Unbiased Sampling for Unstructured Peer-to-Peer Networks
    Daniel Stutzbach, Reza Rejaie, Nick Duffield, Subhabrata Sen, Walter Willinger
    Proceedings of ACM SIGCOMM/USENIX Internet Measurement Conference, pp. 27-40, Rio de Janeiro, Brazil, October 2006 [acceptance rate 22%]
    [PAPER] [PPT]

  • Sampling Techniques for Large, Dynamics Graphs
    Daniel Stutzbach, Reza Rejaie, Nick Duffield, Subhabrata Sen, Walter Willinger
    Proceedings of IEEE Global Internet Symposium, Barcelona, Spain, April 2006 [acceptance rate 36% of 63]
    [PAPER]

Peer Dynamics (Churn)

Understanding Churn in Peer-to-Peer Networks

This study presents the most authoritative analysis on churn in peer-to-peer systems.

The dynamics of peer participation, or churn, are an inher- ent property of Peer-to-Peer (P2P) systems and critical for design and evaluation. Accurately characterizing churn re- quires precise and unbiased information about the arrival and departure of peers, which is challenging to acquire. Prior studies show that peer participation is highly dynamic but with conflicting characteristics. Therefore, churn re- mains poorly understood, despite its significance. In this study, we identify several common pitfalls that lead to measurement error. We carefully address these dif- ficulties and present a detailed study using three widely- deployed P2P systems: an unstructured file-sharing system (Gnutella), a content-distribution system (BitTorrent), and a Distributed Hash Table (Kad). Our analysis reveals several properties of churn: (i) overall dynamics are surprisingly similar across different systems, (ii) session lengths are not exponential, (iii) a large portion of active peers are highly stable while the remaining peers turn over quickly, and (iv) peer session lengths across consecutive appearances are cor- related. In summary, this studyadvances our understanding of churn by improving accuracy, comparing different P2P file sharing/distribution systems, and exploring new aspects of churn.

Publications

  • Understanding Churn in Peer-to-Peer Networks
    Daniel Stutzbach, Reza Rejaie
    Proceedings of ACM SIGCOMM/USENIX Internet Measurement Conference, pp. 189-202, Rio de Janeiro, Brazil, October 2006 [acceptance rate 22%]
    [PAPER] [PPT]

  • Towards a Better Understanding of Churn in Peer-to-Peer Networks
    Daniel Stutzbach, Reza Rejaie
    Technical Report CIS-TR-04-06, University of Oregon, November 2004
    [PAPER]

Team

  • Daniel Stutzbach
  • Reza Rejaie

Funding

This material is based upon work supported in part by the National Science Foundation (NSF) under Grant No. Nets- NBD-0627202 and an unrestricted gift from Cisco Systems. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF or Cisco.

DHT Lookup Performance

Lookup Performance over a Widely-deployed DHT.

This empirical study examines the performance of the key DHT operation, lookup, over Kad.

During recent years, Distributed Hash Tables (DHTs) have been extensively studied through simulation and analysis. However, due to their limited deployment, it has not been possible to observe the behavior of a widely-deployed DHT in practice. Recently, the popular eMule file-sharing software incorporated a Kademlia-based DHT, called Kad, which currently has around one million simultaneous users. In this paper, we empirically study the performance of the key DHT operation, lookup, over Kad. First, we analytically derive the benefits of different ways to increase the richness of routing tables in Kademlia-based DHTs. Second, we empirically characterize two aspects of the accuracy of routing tables in Kad, namely completeness and freshness, and characterize their impact on Kad’s lookup performance. Finally, we investigate how the efficiency and consistency of lookup in Kad can be improved by performing parallel lookup and maintaining multiple replicas, respectively. Our results pinpoint the best operating point for the degree of lookup parallelism and the degree of replication for Kad.

Publications

  • Improving Lookup Performance over a Widely-Deployed DHT
    Daniel Stutzbach, Reza Rejaie
    Proceedings of IEEE INFOCOM, pp. 1-12, Barcelona, Spain, April 2006 [acceptance rate 18%]
    [PAPER] [PPT]

Team

  • Daniel Stutzbach
  • Reza Rejaie

Funding

This material is based upon work supported in part by the National Science Foundation (NSF) under Grant No. Nets- NBD-0627202.

Tsunami, A DHT Botnet

A Parasitic, Indestructible Botnet on Kad

We identify and investigate a new type of botnet, called Tsunami, in Kad.

While current botnets rely on a central server or bootstrap nodes for their operations, in this study we identify and investigate a new type of botnet, called Tsunami, in which no such bottleneck nodes exist. In particular, we study how a Tsunami botnet can build a parasitic relationship with a widely deployed P2P system, Kad, to successfully issue commands to its bots, launch various attacks, including distributed denial of service (DDoS) and spam, at ease, as well as receive responses from the bots. Our evaluation shows that in a Kad network with four million nodes, even with only 6 % nodes being Tsunami bots, Tsunami can reach 75 % of its bots in less than 4 min and receive responses from 99 % of bots. We further propose how we may defend against Tsunami and evaluate the defense solution.

Tsunami

Publications

  • Tsunami: A Parasitic, Indestructible Botnet on Kad
    Ghulam Memon, Jun Li, Reza Rejaie
    Peer-to-Peer Networking and Applications, Vol. 7, Issue 4, pp. 444 - 455, April 2013
    [PAPER]

Team

  • Ghulam Memon
  • Jun Li
  • Reza Rejaie

Funding

This material is based upon work supported in part by the NSF under Grant No. NeTs-NBD-0627202.

Shared Files

Characterizing Files in the Modern Gnutella Network.

The Internet has witnessed an explosive increase in the popularity of Peer-to-Peer (P2P) file-sharing applications during the past few years. As these applications become more popular, it becomes increasingly important to characterize their behavior in order to improve their performance and quantify their impact on the network. In this paper, we present a measurement study on characteristics of available files in the modern Gnutella system. We developed a new methodology to capture accurate “snapshots” of available files in a large scale P2P system. This methodology was implemented in a parallel crawler that captures the entire overlay topology of the system where each peer in the overlay is annotated with its available files. We have captured tens of snapshots of the Gnutella system and conducted three types of analysis on available files: (i) Static analysis, (ii) Topological analysis and (iii) Dynamic analysis. Our results reveal several interesting properties of available files in Gnutella that can be leveraged to improve the design and evaluations of P2P file-sharing applications.

Publications

Team

  • Shanyu Zhao
  • Daniel Stutzbach
  • Reza Rejaie

Funding

This material is based upon work supported in part by the National Science Foundation (NSF) under Grant No. Nets- NBD-0627202

Monitoring DHT Traffic

This study presents a new, scalable technique for accurately capturing traffic in a widely deployed DHT.

This study presents a new technique, called Montra, for accurately capturing traffic in a widely deployed DHT. The basic idea is to make the traffic monitors minimally visible to participating peers to avoid disruption in the system. We describe how Montra leverages the required redundancy in published content and routing to minimize disruption on the system. Validations of Montra over two widely deployed DHTs, namely Kad and Azureus, show that it can accurately capture more than 90% of traffic destined to monitored peers. Furthermore, the lightweight nature of Montra allows it to monitor a large number of peers with a moderate amount of resources. We use Montra to characterize several aspects of traffic in our two target DHTs.

Publications

  • Montra: A Large-Scale DHT Traffic Monitor
    Ghulam Memon, Reza Rejaie, Yang Guo, Daniel Stutzbach
    Elsevier Computer Networks, Special Issue on Measurement-based Optimization of P2P Networking and Applications, Volume 56, Issue 3, pp. 1080-1091, February 2012
    [PAPER]

  • Large-Scale Monitoring of DHT Traffic
    Ghulam Memon, Reza Rejaie, yang Guo, Daniel Stutzbach
    Proceedings of International Workshop on Peer-to-Peer Systems (IPTPS), Boston, MA, April 2009 [acceptance rate 20%]
    [PAPER] [HTML] [PPT]

Team

  • Ghulam Memon
  • Yang Guo (Thompson Lab)
  • Daniel Stutzbach
  • Reza Rejaie

Funding

This material is based upon work supported in part by the NSF under Grant No. NeTs-NBD-0627202.

Overlay Topology

Respondent-driven Sampling for Characterizing Unstructured Overlays.

This study presents Respondent-Driven Sampling (RDS) as a promising technique to derive unbiased estimates of node properties in unstructured overlay networks.

This study presents Respondent-Driven Sampling (RDS) as a promising technique to derive unbiased estimates of node properties in unstructured overlay networks such as Gnutella. Using RDS and a previously proposed technique, namely Metropolized Random Walk (MRW) sampling, we exam- ine the efficiency of estimating node properties in unstructured overlays and identify some of the key factors that determine the accuracy of sampling techniques. We evaluate the RDS and MRW techniques using simulation over a wide range of static and dynamic graphs as well as experiments over a widely deployed Gnutella network. Our study sheds light on how the connectivity structure among nodes and its dynamics affect the accuracy and efficiency of the two sampling techniques. Both techniques exhibit a rather similar performance over a wide range of scenarios. However, RDS significantly outperforms MRW when the overlay structure exhibits a combination of highly skewed node degrees and highly skewed (local) clustering coefficients.

Publications

  • Respondent-driven Sampling for Characterizing Unstructured Overlays
    Amir H. Rasti, Mojtaba Torkjazi, Reza Rejaie, Nick Duffield, Walter Willinger. Daniel Stutzbach
    Proceedings of IEEE INFOCOM Mini-conference, Rio de Janeiro, Brazil, April 2009 [acceptance rate including INFOCOM 26.6%]
    [PAPER] [PPT]

Team

  • Amir H. Rasti
  • Mojtaba Torkjazi
  • Reza Rejaie
  • Nick Duffield
  • Walter Willinger
  • Daniel Stutzbach

Impact on [AS] Underlay

Characterizing the Global Impact of P2P Overlays on the AS-Level Underlay.

This study maps the routes associated with a P2P overlay to the AS-level underlay and characterizes their impact in particular the propagation of traffic through the AS level hierarchy.

This studyexamines the problem of characterizing and as- sessing the global impact of the load imposed by a Peer-to-Peer (P2P) overlay on the AS-level underlay. In particular, we capture Gnutella snap- shots for four consecutive years, obtain the corresponding AS-level topology snapshots of the Internet and infer the AS-paths associated with each overlay connection. Assuming a simple model of overlay traffic, we analyze the observed load imposed by these Gnutella snapshots on the AS-level underlay using metrics that characterize the load seen on individual AS-paths and by the transit ASes, illustrate the churn among the top transit ASes during this 4-year period, and describe the propagation of traffic within the AS-level hierarchy.

Publications

  • Characterizing the Global Impact of the P2P Overlay on the AS-level Underlay
    Amir H. Rasti, Reza Rejaie, Walter Willinger
    Technical Report CIS-TR-10-01, University of Oregon, January 2010
    [PAPER]

Team

  • Amir H. Rasti
  • Reza Rejaie
  • Walter Willinger

Funding

This work is supported in part by the National Science Foundation (NSF) under Grant No. Nets-NBD-0627202 and an unrestricted gift from Cisco Systems.