root / dotorg / v8 / html / beps / bep_0005.rst

Revision 10528, 17.4 kB (checked in by dave, 11 months ago)

Change revision number to automatically updated $Revision$ in all .rst files.
Change last-modified date to automatically updated $Date$ in all .rst files.
Add "Local Variables" to the end of all .rst files.
Fix BEP numbers.
Make header formatting in all .rst files conform to PEP-style rst formatting requirements.

Remove some HTML markup from the bep_0003.rst, the BitTorrent? protocol standard.

Line 
1BEP: 5
2Title: DHT Protocol
3Version: $Revision$
4Last-Modified: $Date$
5Author:  Andrew Loewenstern <drue@bittorrent.com>
6Status:  Draft
7Type:    Standards Track
8Created: 31-Jan-2008
9Post-History:
10
11BitTorrent uses a "distributed sloppy hash table" (DHT) for storing
12peer contact information for "trackerless" torrents. In effect, each
13peer becomes a tracker. The protocol is based on Kademila [#Kademlia]_ and is
14implemented over UDP.
15
16Please note the terminology used in this document to avoid
17confusion. A "peer" is a client/server listening on a TCP port that
18implements the BitTorrent protocol. A "node" is a client/server
19listening on a UDP port implementing the distributed hash table
20protocol. The DHT is composed of nodes and stores the location of
21peers. BitTorrent clients include a DHT node, which is used to contact
22other nodes in the DHT to get the location of peers to download from
23using the BitTorrent protocol.
24
25
26Overview
27========
28
29Each node has a globally unique identifier known as the "node ID."
30Node IDs are chosen at random from the same 160-bit space as
31BitTorrent infohashes [#entropy]_.  A "distance metric" is used to
32compare two node IDs or a node ID and an infohash for "closeness."
33Nodes must maintain a routing table containing the contact information
34for a small number of other nodes.  The routing table becomes more
35detailed as IDs get closer to the node's own ID. Nodes know about many
36other nodes in the DHT that have IDs that are "close" to their own but
37have only a handful of contacts with IDs that are very far away from
38their own.
39
40In Kademlia, the distance metric is XOR and the result is interpreted
41as an unsigned integer. ``distance(A,B) = |A xor B|`` Smaller values
42are closer.
43
44When a node wants to find peers for a torrent, it uses the distance
45metric to compare the infohash of the torrent with the IDs of the
46nodes in its own routing table. It then contacts the nodes it knows
47about with IDs closest to the infohash and asks them for the contact
48information of peers currently downloading the torrent. If a contacted
49node knows about peers for the torrent, the peer contact information
50is returned with the response. Otherwise, the contacted node must
51respond with the contact information of the nodes in its routing table
52that are closest to the infohash of the torrent. The original node
53iteratively queries nodes that are closer to the target infohash until
54it cannot find any closer nodes. After the search is exhausted, the
55client then inserts the peer contact information for itself onto the
56responding nodes with IDs closest to the infohash of the torrent.
57
58The return value for a query for peers includes an opaque value known
59as the "token." For a node to announce that its controlling peer is
60downloading a torrent, it must present the token received from the
61same queried node in a recent query for peers. When a node attempts to
62"announce" a torrent, the queried node checks the token against the
63querying node's IP address. This is to prevent malicious hosts from
64signing up other hosts for torrents. Since the token is merely
65returned by the querying node to the same node it received the token
66from, the implementation is not defined. Tokens must be accepted for a
67reasonable amount of time after they have been distributed. The
68BitTorrent implementation uses the SHA1 hash of the IP address
69concatenated onto a secret that changes every five minutes and tokens
70up to ten minutes old are accepted.
71
72
73Routing Table
74=============
75
76Every node maintains a routing table of known good nodes. The nodes in
77the routing table are used as starting points for queries in the
78DHT. Nodes from the routing table are returned in response to queries
79from other nodes.
80
81Not all nodes that we learn about are equal. Some are "good" and some
82are not. Many nodes using the DHT are able to send queries and receive
83responses, but are not able to respond to queries from other nodes. It
84is important that each node's routing table must contain only known
85good nodes. A good node is a node has responded to one of our queries
86within the last 15 minutes. A node is also good if it has ever
87responded to one of our queries and has sent us a query within the
88last 15 minutes. After 15 minutes of inactivity, a node becomes
89questionable. Nodes become bad when they fail to respond to multiple
90queries in a row. Nodes that we know are good are given priority over
91nodes with unknown status.
92
93The routing table covers the entire node ID space from 0 to
942\ :sup:`160`\ .  The routing table is subdivided into "buckets" that
95each cover a portion of the space. An empty table has one bucket with
96an ID space range of min=0, max=2\ :sup:`160`\ . When a node with ID
97"N" is inserted into the table, it is placed within the bucket that
98has min &lt;= N &lt; max. An empty table has only one bucket so any
99node must fit within it. Each bucket can only hold K nodes, currently
100eight, before becoming "full." When a bucket is full of known good
101nodes, no more nodes may be added unless our own node ID falls within
102the range of the bucket. In that case, the bucket is replaced by two
103new buckets each with half the range of the old bucket and the nodes
104from the old bucket are distributed among the two new ones. For a new
105table with only one bucket, the full bucket is always split into two
106new buckets covering the ranges 0..2\ :sup:`159`\  and
1072\ :sup:`159`\ ..2\ :sup:`160`\ .
108
109When the bucket is full of good nodes, the new node is simply
110discarded. If any nodes in the bucket are known to have become bad,
111then one is replaced by the new node. If there are any questionable
112nodes in the bucket have not been seen in the last 15 minutes, the
113least recently seen node is pinged. If the pinged node responds then
114the next least recently seen questionable node is pinged until one
115fails to respond or all of the nodes in the bucket are known to be
116good. If a node in the bucket fails to respond to a ping, it is
117suggested to try once more before discarding the node and replacing it
118with a new good node. In this way, the table fills with stable long
119running nodes.
120
121Each bucket should maintain a "last changed" property to
122indicate how "fresh" the contents are. When a node in a bucket is
123pinged and it responds, or a node is added to a bucket, or a node in a
124bucket is replaced with another node, the bucket's last changed
125property should be updated. Buckets that have not been changed in 15
126minutes should be "refreshed." This is done by picking a random ID in
127the range of the bucket and performing a find_nodes search on it. Nodes
128that are able to receive queries from other nodes usually do not need
129to refresh buckets often. Nodes that are not able to receive queries
130from other nodes usually will need to refresh all buckets periodically
131to ensure there are good nodes in their table when the DHT is needed.
132
133Upon inserting the first node into its routing table and when starting
134up thereafter, the node should attempt to find the closest nodes in
135the DHT to itself. It does this by issuing find_node messages to
136closer and closer nodes until it cannot find any closer. The routing
137table should be saved between invocations of the client software.
138
139
140BitTorrent Protocol Extension
141=============================
142
143The BitTorrent protocol has been extended to exchange node UDP port
144numbers between peers that are introduced by a tracker. In this way,
145clients can get their routing tables seeded automatically through the
146download of regular torrents. Newly installed clients who attempt to
147download a trackerless torrent on the first try will not have any
148nodes in their routing table and will need the contacts included in
149the torrent file.
150
151Peers supporting the DHT set the last bit of the 8-byte reserved flags
152exchanged in the BitTorrent protocol handshake. Peer receiving a
153handshake indicating the remote peer supports the DHT should send a
154PORT message. It begins with byte 0x09 and has a two byte payload
155containing the UDP port of the DHT node in network byte order.  Peers
156that receive this message should attempt to ping the node on the
157received port and IP address of the remote peer. If a response to the
158ping is recieved, the node should attempt to insert the new contact
159information into their routing table according to the usual rules.
160
161
162Torrent File Extensions
163=======================
164
165A trackerless torrent dictionary does not have an "announce" key.
166Instead, a trackerless torrent has a "nodes" key. This key should be
167set to the K closest nodes in the torrent generating client's routing
168table. Alternatively, the key could be set to a known good node such
169as one operated by the person generating the torrent. Please do not
170automatically add "router.bittorrent.com" to torrent files or
171automatically add this node to clients routing tables.
172
173::
174
175  nodes = [["<host>", <port>], ["<host>", <port>], ...]
176  nodes = [["127.0.0.1", 6881], ["your.router.node", 4804]]
177
178 
179
180KRPC Protocol
181=============
182
183The KRPC protocol is a simple RPC mechanism consisting of bencoded
184dictionaries sent over UDP. A single query packet is sent out and a
185single packet is sent in response. There is no retry. There are three
186message types: query, response, and error. For the DHT protocol, there
187are four queries: ping, find_node, get_peers, and announce_peer.
188
189A KRPC message is a single dictionary with two keys common to
190every message and additional keys depending on the type of message.
191Every message has a key "t" with a string value representing a transaction
192ID. This transaction ID is generated by the querying node and is echoed
193in the response, so responses may be correlated with multiple queries
194to the same node. The transaction ID should be encoded as a short string
195of binary numbers, typically 2 characters are enough as they cover 2^16
196outstanding queries. The other key contained in every KRPC message is "y"
197with a single character value describing the type of message. The value
198of the "y" key is one of "q" for query, "r" for response, or "e" for
199error.
200
201Contact Encoding
202----------------
203
204Contact information for peers is encoded as a 6-byte string. Also
205known as "Compact IP-address/port info" the 4-byte IP address is in
206network byte order with the 2 byte port in network byte order
207concatenated onto the end.
208 
209Contact information for nodes is encoded as a 26-byte string.
210Also known as "Compact node info" the 20-byte Node ID in network byte
211order has the compact IP-address/port info concatenated to the end.
212
213Queries
214-------
215
216Queries, or KRPC message dictionaries with a "y" value of "q",
217contain two additional keys; "q" and "a". Key "q" has a string value
218containing the method name of the query. Key "a" has a dictionary value
219containing named arguments to the query.
220
221Responses
222---------
223
224Responses, or KRPC message dictionaries with a "y" value of "r",
225contain one additional key "r". The value of "r" is a dictionary
226containing named return values. Response messages are sent upon
227successful completion of a query.
228
229Errors
230------
231
232Errors, or KRPC message dictionaries with a "y" value of "e",
233contain one additional key "e". The value of "e" is a list. The first
234element is an integer representing the error code. The second element
235is a string containing the error message. Errors are sent when a query
236cannot be fulfilled. The following table describes the possible error
237codes:
238
239+----------+------------------------------------------+
240|  Code    | Description                              |
241+----------+------------------------------------------+
242|  201     |   Generic Error                          |
243+----------+------------------------------------------+
244|  202     |   Server Error                           |
245+----------+------------------------------------------+
246|  203     | Protocol Error, such as a malformed      |
247|          | packet, invalid arguments, or bad token  |
248+----------+------------------------------------------+
249|  204     |   Method Unknown                         |
250+----------+------------------------------------------+
251
252Example Error Packets:
253
254::
255
256  generic error = {"t":"aa", "y":"e", "e":[201, "A Generic Error Ocurred"]}
257  bencoded = d1:eli201e23:A Generic Error Ocurrede1:t2:aa1:y1:ee
258
259 
260DHT Queries
261===========
262
263All queries have an "id" key and value containing the node ID of the
264querying node. All responses have an "id" key and value containing the
265node ID of the responding node.
266
267ping
268----
269
270The most basic query is a ping. "q" = "ping" A ping query has a
271single argument, "id" the value is a 20-byte string containing the
272senders node ID in network byte order. The appropriate response to a
273ping has a single key "id" containing the node ID of the responding
274node.
275
276::
277
278  arguments:  {"id"&nbsp;: "<querying nodes id>"}
279 
280  response: {"id"&nbsp;: "<queried nodes id>"}
281
282
283Example Packets
284::
285
286  ping Query = {"t":"aa", "y":"q", "q":"ping", "a":{"id":"abcdefghij0123456789"}}
287  bencoded = d1:ad2:id20:abcdefghij0123456789e1:q4:ping1:t2:aa1:y1:qe
288
289
290::
291
292  Response = {"t":"aa", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}}
293  bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t2:aa1:y1:re
294
295
296find_node
297---------
298
299Find node is used to find the contact information for a node given
300its ID. "q" == "find_node" A find_node query has two arguments, "id"
301containing the node ID of the querying node, and "target" containing
302the ID of the node sought by the queryer. When a node receives a
303find_node query, it should respond with a key "nodes" and value of a
304string containing the compact node info for the target node or the K
305(8) closest good nodes in its own routing table.
306
307::
308
309  arguments:  {"id"&nbsp;: "<querying nodes id>", "target"&nbsp;: "<id of target node>"}
310
311  response: {"id"&nbsp;: "<queried nodes id>", "nodes"&nbsp;: "<compact node info>"}
312
313
314Example Packets
315::
316
317  find_node Query = {"t":"aa", "y":"q", "q":"find_node", "a": {"id":"abcdefghij0123456789", "target":"mnopqrstuvwxyz123456"}}
318  bencoded = d1:ad2:id20:abcdefghij01234567896:target20:mnopqrstuvwxyz123456e1:q9:find_node1:t2:aa1:y1:qe
319
320
321::
322
323  Response = {"t":"aa", "y":"r", "r": {"id":"0123456789abcdefghij", "nodes": "def456..."}}
324  bencoded = d1:rd2:id20:0123456789abcdefghij5:nodes9:def456...e1:t2:aa1:y1:re
325
326
327get_peers
328---------
329
330Get peers associated with a torrent infohash. "q" = "get_peers" A
331get_peers query has two arguments, "id" containing the node ID of the
332querying node, and "info_hash" containing the infohash of the torrent.
333If the queried node has peers for the infohash, they are returned in a
334key "values" as a list of strings. Each string containing "compact" format
335peer information for a single peer. If the queried node has no
336peers for the infohash, a key "nodes" is returned containing the K
337nodes in the queried nodes routing table closest to the infohash
338supplied in the query. In either case a "token" key is also included in
339the return value. The token value is a required argument for a future
340announce_peer query. The token value should be a short binary string.
341
342::
343
344  arguments:  {"id"&nbsp;: "<querying nodes id>", "info_hash"&nbsp;: "<20-byte infohash of target torrent>"}
345
346  response: {"id"&nbsp;: "<queried nodes id>", "token"&nbsp;:"<opaque write token>", "values"&nbsp;: ["<peer 1 info string>", "<peer 2 info string>"]}
347
348  or: {"id"&nbsp;: "<queried nodes id>", "token"&nbsp;:"<opaque write token>", "nodes"&nbsp;: "<compact node info>"}
349
350
351Example Packets:
352::
353
354  get_peers Query = {"t":"aa", "y":"q", "q":"get_peers", "a": {"id":"abcdefghij0123456789", "info_hash":"mnopqrstuvwxyz123456"}}
355  bencoded = d1:ad2:id20:abcdefghij01234567899:info_hash20:mnopqrstuvwxyz123456e1:q9:get_peers1:t2:aa1:y1:qe
356 
357
358::
359
360  Response with peers = {"t":"aa", "y":"r", "r": {"id":"abcdefghij0123456789", "token":"aoeusnth", "values": ["axje.u", "idhtnm"]}}
361  bencoded = d1:rd2:id20:abcdefghij01234567895:token8:aoeusnth6:valuesl6:axje.u6:idhtnmee1:t2:aa1:y1:re
362
363
364::
365
366  Response with closest nodes = {"t":"aa", "y":"r", "r": {"id":"abcdefghij0123456789", "token":"aoeusnth", "nodes": "def456..."}}
367  bencoded = d1:rd2:id20:abcdefghij01234567895:nodes9:def456...5:token8:aoeusnthe1:t2:aa1:y1:re
368
369
370announce_peer
371-------------
372
373Announce that the peer, controlling the querying node, is downloading
374a torrent on a port. announce_peer has four arguments: "id" containing the node ID of the
375querying node, "info_hash" containing the infohash of the torrent,
376"port" containing the port as an integer, and the "token" received in
377response to a previous get_peers query. The queried node must verify
378that the token was previously sent to the same IP address as the
379querying node. Then the queried node should store the IP address of the
380querying node and the supplied port number under the infohash in its
381store of peer contact information.
382
383::
384
385  arguments:  {"id" : "<querying nodes id>", "info_hash" : "<20-byte infohash of target torrent>", "port" : <port number>, "token" : "<opaque token>"}
386 
387  response: {"id" : "<queried nodes id>"}
388 
389
390Example Packets:
391::
392
393  announce_peers Query = {"t":"aa", "y":"q", "q":"announce_peer", "a": {"id":"abcdefghij0123456789", "info_hash":"mnopqrstuvwxyz123456", "port": 6881, "token": "aoeusnth"}}
394  bencoded = d1:ad2:id20:abcdefghij01234567899:info_hash20:<br />
395  mnopqrstuvwxyz1234564:porti6881e5:token8:aoeusnthe1:q13:announce_peer1:t2:aa1:y1:qe
396
397
398::
399
400  Response = {"t":"aa", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}}
401  bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t2:aa1:y1:re
402
403References
404==========
405
406.. [#Kademlia] Peter Maymounkov, David Mazieres, "Kademlia: A Peer-to-peer Information System Based on the XOR Metric", *IPTPS 2002*. http://www.cs.rice.edu/Conferences/IPTPS02/109.pdf
407
408.. [#entropy] Use SHA1 and plenty of entropy to ensure a unique ID.
409
410
411
412..
413   Local Variables:
414   mode: indented-text
415   indent-tabs-mode: nil
416   sentence-end-double-space: t
417   fill-column: 70
418   coding: utf-8
419   End:
Note: See TracBrowser for help on using the browser.