DHT Traffic Characterization
Distributed Hash Tables (DHTs) are Peer-to-Peer (P2P) networks that provide hash table-like functionality. In a DHT, participating peers form a structured overlay where individual peers are responsible for storing and answering queries of a portion of the hash table's identifier space. Since the hash table and query workload are distributed among peers, a DHT is able to provide lookup service with little to no infrastructure support.
These nice features of DHTs have attracted interest from both academia and industry since its inception. Later DHT designs, such as OpenDHT, Accordion, and Kademlia, focused on improving DHT design for better real-world deployment. More recently, Kad and Azureus, both implementations of Kademlia, have been successfully deployed.
Better understanding of existing systems not only allows us to improve their performance, but also sheds light on the design of future DHT systems. While extensive simulation experiments, analysis and small scale deployments have been conducted in the past, few studies have been done on the existing systems with large numbers of peers. As a result, the real-world properties of large scale DHTs are not well understood. Our research effort focuses on understanding traffic characteristics of DHTs in the wild.
Montra: Large-Scale DHT Traffic Monitor
Accurate measurement of traffic in a DHT is a challenging task. The sheer size of the problem is daunting: an appropriate portion of peers need to be monitored in order to collect sufficient data to draw any meaningful results. An instrumented peer can be deployed to collect the traffic data it receives. However, deploying a large number of monitoring peers requires a significant amount of computing resources and network bandwidth. Furthermore, monitoring peers need to participate in the DHT so as to monitor it. The addition of a large number of monitoring peers may drastically alter the traffic pattern inside DHT, potentially making the measurement results biased or even invalid.
We present a DHT traffic-monitoring technique, called Montra, that can monitor queries to tens of thousands of peers using modest resources, without significantly disturbing the DHT's normal operation. We implement Montra in the form of a highly parallel, scalable python based client to monitor Kad and Azureus, which are large-scale DHTs. We rigorously evaluate our implementation of Montra using actual Kad and Azureus networks. We found that our implementation of Montra can accurately monitor around 32,000 Kad peers and 37,000 Azureus peers using a moderately configured PC (an Intel Core 2 Duo with 1GB RAM) and can capture more than 90% of query traffic observed by monitored peers. In addition, Montra can also identify destination peers for 90% of captured traffic. Finally, we conduct a detailed characterization study of captured traffic.
MONKey: Monitoring Publish Traffic in Kad
Monkey is an extension of Montra. It is aimed at monitoring global publish traffic in the Kad DHT to understand the user publish behavior and the content (files and keywords) in the DHT. We use Montra's captured traffic to find the most popular keywords. Then, we monitor those keywords to observe majority of publish behavior in Kad.
- Ghulam Memon (Department of Computer and Information Science, University of Oregon)
- Daniel Stutzbach (Stutzbach Enterprises)
- Yang Guo (Corporate Research, Thomson)
- Reza Rejaie (Department of Computer and Information Science, University of Oregon)
Large-Scale Monitoring of DHT Traffic
Ghulam Memon, Reza Rejaie, Yang Guo, Daniel Stutzbach
Proceedings of 8th International Workshop on Peer-to-Peer Systems (IPTPS '09)
Montra: Large-Scale DHT Traffic Monitor
Ghulam Memon, Daniel Stutzbach, Yang Guo, Reza Rejaie
Journal Version in Submission
How to place DHT traffic monitors? Experiences from large-scale DHT traffic monitoring
Ghulam Memon, Reza Rejaie
Code and Data
Please contact Ghulam Memon for data and code.