DHT Traffic Characterization



Overview

Distributed Hash Tables (DHTs) are Peer-to-Peer (P2P) networks that provide hash table-like functionality. In a DHT, participating peers form a structured overlay where individual peers are responsible for storing and answering queries of a portion of the hash table's identifier space. Since the hash table and query workload are distributed among peers, a DHT is able to provide lookup service with little to no infrastructure support.

These nice features of DHTs have attracted interest from both academia and industry since its inception. Later DHT designs, such as OpenDHT, Accordion, and Kademlia, focused on improving DHT design for better real-world deployment. More recently, Kad and Azureus, both implementations of Kademlia, have been successfully deployed.

Better understanding of existing systems not only allows us to improve their performance, but also sheds light on the design of future DHT systems. While extensive simulation experiments, analysis and small scale deployments have been conducted in the past, few studies have been done on the existing systems with large numbers of peers. As a result, the real-world properties of large scale DHTs are not well understood. Our research effort focuses on understanding traffic characteristics of DHTs in the wild.

Projects

Montra: Large-Scale DHT Traffic Monitor

Accurate measurement of traffic in a DHT is a challenging task. The sheer size of the problem is daunting: an appropriate portion of peers need to be monitored in order to collect sufficient data to draw any meaningful results. An instrumented peer can be deployed to collect the traffic data it receives. However, deploying a large number of monitoring peers requires a significant amount of computing resources and network bandwidth. Furthermore, monitoring peers need to participate in the DHT so as to monitor it. The addition of a large number of monitoring peers may drastically alter the traffic pattern inside DHT, potentially making the measurement results biased or even invalid.

We present a DHT traffic-monitoring technique, called Montra, that can monitor queries to tens of thousands of peers using modest resources, without significantly disturbing the DHT's normal operation. We implement Montra in the form of a highly parallel, scalable python based client to monitor Kad and Azureus, which are large-scale DHTs. We rigorously evaluate our implementation of Montra using actual Kad and Azureus networks. We found that our implementation of Montra can accurately monitor around 32,000 Kad peers and 37,000 Azureus peers using a moderately configured PC (an Intel Core 2 Duo with 1GB RAM) and can capture more than 90% of query traffic observed by monitored peers. In addition, Montra can also identify destination peers for 90% of captured traffic. Finally, we conduct a detailed characterization study of captured traffic.

MONKey: Monitoring Publish Traffic in Kad

Monkey is an extension of Montra. It is aimed at monitoring global publish traffic in the Kad DHT to understand the user publish behavior and the content (files and keywords) in the DHT. We use Montra's captured traffic to find the most popular keywords. Then, we monitor those keywords to observe majority of publish behavior in Kad.

Team Members

Publications

  • Large-Scale Monitoring of DHT Traffic
    Ghulam Memon, Reza Rejaie, Yang Guo, Daniel Stutzbach
    Proceedings of 8th International Workshop on Peer-to-Peer Systems (IPTPS '09)
    [pdf] [slides]

  • Montra: Large-Scale DHT Traffic Monitor
    Ghulam Memon, Daniel Stutzbach, Yang Guo, Reza Rejaie
    Journal Version in Submission

  • How to place DHT traffic monitors? Experiences from large-scale DHT traffic monitoring
    Ghulam Memon, Reza Rejaie
    In preparation

Code and Data

Please contact Ghulam Memon for data and code.