Kademlia
|
Kademlia is a P2P overlay protocol designed by Petar Maymounkov and David Mazières, for decentralized peer to peer computer networks. It specifies the structure of the network, regulates communication between nodes and how the exchange of information has to take place. Kademlia nodes communicate among themselves using the transport protocol UDP (see OSI model). Kademlia nodes store data by implementing a Distributed Hash Table (DHT). Over an existing LAN/WAN (like the Internet) a new, virtual network, is created in which each network node is identified by a number ("Node ID"). This number serves not only as its identification, but the Kademlia algorithm uses it for further purposes.
A node that would like to join the net must first go through a bootstrap process. In this phase, the node needs to know the IP address of another node (obtained from the user, or from a stored list) that is already participating in the Kademlia network. If the bootstrapping node has not yet participated in the network, it computes a random ID number that is not already assigned to any other node. It uses this ID until leaving the network.
The Kademlia algorithm is based on the calculation of the "distance" between two nodes. This distance is computed as the exclusive or of the two node IDs, taking the result as an integer number.
This "distance" does not have anything to do with geographical conditions, but designates the distance within the ID range. Thus it can and does happen that, for example, a node from Germany and one from Australia are "neighbours".
Information within Kademlia is stored in so called "values", every value being attached to a "key".
When searching for some key, the algorithm explores the network in several steps, each step approaching closer to the searched-for key, until the contacted node returns the value, or no more closer nodes are found. The number of nodes contacted during the search is only marginally dependent on the size of the network: If the number of participants in the net doubles in number, then a user's node must query only one more node per search, not twice as many.
Further advantages are found particularly in the decentralized structure, which clearly increases the resistance against a denial of service attack. Even if a whole set of nodes are flooded, this will have limited effect on network availability, which will recover itself by knitting the network around these "holes".
Use In File-Sharing Networks
Kademlia has been used for file sharing. By making Kademlia keyword searches, one can find information in the file-sharing network so it can be downloaded. Since there is no central instance to store an index of existing files, this task is divided evenly among all clients: The node that possesses a file it wants to share, processes the contents of the file, calculating from it a number (hash) that will identify this file within the file-sharing network. The hashes and the node IDs must be of the same length. He searches for several nodes whose ID is close to the hash, and has his own IP address stored at those nodes. A searching client will use Kademlia to search the network for the node whose ID has the smallest distance to the filehash, then will retrieve the contacts list that is stored in that node. The contacts stored in the network are under constant change as nodes connect and disconnect. Due to built-in redundancy, these contacts are replicated in several nodes.
The filehash is usually obtained from a specially formed internet link found elsewhere, or included within an indexing file obtained from other sources.
Searching for filenames is implemented using keywords. The filename is divided into its constituing words. Each of these keywords is hashed and stored into the network in the same way as a filehash, together with the corresponding filename and filehash. A searcher will choose one of the keywords, contact the node with ID closest to that keyword hash, and retrieve the list of filenames that contain the keyword. Since every filename in the list has its hash attached, the chosen file can then be obtained in the normal way.
Public clients using the Kademlia algorithm (networks are incompatible with one another):
- Overnet network: Overnet(now integrated in eDonkey2000 1.0), eDonkeyHybrid and mlDonkey
- Kad Network: eMule (starting from v0.40) - mlDonkey (starting from 2.5-28)
- RevConnect - see also [1] (http://www.revconnect.com/) (starting with version 0.403)
- KadC: [2] (http://kadc.sourceforge.net/) C library to publish and retrieve information to/from the Overnet network
- Azureus: Since version 2.3.0.0, uses a Kademlia network as an alternate method of locating peers if BitTorrent trackers fail, or can use it exclusively for some torrents.
- Original BitTorrent: Beta 4.1.0 now has a Kademlia network for trackerless torrents.
- Bitspirit (as of 3.0)
Further Information
- "Kademlia: The optimized Peer network", http://kademlia.scs.cs.nyu.edu/ (possibly dead, mirror (http://web.archive.org/web/20031206222134/http%3a//kademlia.scs.cs.nyu.edu/))
- "Kademlia: A of Peer to peer information system Based on the XOR Metric", http://www.cs.rice.edu/Conferences/IPTPS02/109.pdfde:Kademlia