| 11 | | BitTorrent uses a "distributed sloppy hash table" (DHT) for storing |
| 12 | | peer contact information for "trackerless" torrents. In effect, each |
| 13 | | peer becomes a tracker. The protocol is based on Kademila [#Kademlia]_ and is |
| 14 | | implemented over UDP. |
| 15 | | |
| 16 | | Please note the terminology used in this document to avoid |
| 17 | | confusion. A "peer" is a client/server listening on a TCP port that |
| 18 | | implements the BitTorrent protocol. A "node" is a client/server |
| 19 | | listening on a UDP port implementing the distributed hash table |
| 20 | | protocol. The DHT is composed of nodes and stores the location of |
| 21 | | peers. BitTorrent clients include a DHT node, which is used to contact |
| 22 | | other nodes in the DHT to get the location of peers to download from |
| 23 | | using the BitTorrent protocol. |
| 24 | | |
| 25 | | |
| 26 | | Overview |
| 27 | | ======== |
| 28 | | |
| 29 | | Each node has a globally unique identifier known as the "node ID." |
| 30 | | Node IDs are chosen at random from the same 160-bit space as |
| 31 | | BitTorrent infohashes [#entropy]_. A "distance metric" is used to |
| 32 | | compare two node IDs or a node ID and an infohash for "closeness." |
| 33 | | Nodes must maintain a routing table containing the contact information |
| 34 | | for a small number of other nodes. The routing table becomes more |
| 35 | | detailed as IDs get closer to the node's own ID. Nodes know about many |
| 36 | | other nodes in the DHT that have IDs that are "close" to their own but |
| 37 | | have only a handful of contacts with IDs that are very far away from |
| 38 | | their own. |
| 39 | | |
| 40 | | In Kademlia, the distance metric is XOR and the result is interpreted |
| 41 | | as an unsigned integer. ``distance(A,B) = |A xor B|`` Smaller values |
| 42 | | are closer. |
| 43 | | |
| 44 | | When a node wants to find peers for a torrent, it uses the distance |
| 45 | | metric to compare the infohash of the torrent with the IDs of the |
| 46 | | nodes in its own routing table. It then contacts the nodes it knows |
| 47 | | about with IDs closest to the infohash and asks them for the contact |
| 48 | | information of peers currently downloading the torrent. If a contacted |
| 49 | | node knows about peers for the torrent, the peer contact information |
| 50 | | is returned with the response. Otherwise, the contacted node must |
| 51 | | respond with the contact information of the nodes in its routing table |
| 52 | | that are closest to the infohash of the torrent. The original node |
| 53 | | iteratively queries nodes that are closer to the target infohash until |
| 54 | | it cannot find any closer nodes. After the search is exhausted, the |
| 55 | | client then inserts the peer contact information for itself onto the |
| 56 | | responding nodes with IDs closest to the infohash of the torrent. |
| 57 | | |
| 58 | | The return value for a query for peers includes an opaque value known |
| 59 | | as the "token." For a node to announce that its controlling peer is |
| 60 | | downloading a torrent, it must present the token received from the |
| 61 | | same queried node in a recent query for peers. When a node attempts to |
| 62 | | "announce" a torrent, the queried node checks the token against the |
| 63 | | querying node's IP address. This is to prevent malicious hosts from |
| 64 | | signing up other hosts for torrents. Since the token is merely |
| 65 | | returned by the querying node to the same node it received the token |
| 66 | | from, the implementation is not defined. Tokens must be accepted for a |
| 67 | | reasonable amount of time after they have been distributed. The |
| 68 | | BitTorrent implementation uses the SHA1 hash of the IP address |
| 69 | | concatenated onto a secret that changes every five minutes and tokens |
| 70 | | up to ten minutes old are accepted. |
| 71 | | |
| 72 | | |
| 73 | | Routing Table |
| | 11 | The Fast Extension packages several extensions: *Have None/Have All*, |
| | 12 | *Reject Requests*, *Suggestions* and *Allowed Fast.* |
| | 13 | These are enabled by setting the third least significant bit of the |
| | 14 | last reserved byte in the BitTorrent handshake: |
| | 15 | |
| | 16 | :: |
| | 17 | |
| | 18 | reserved[7] |= 0x04 |
| | 19 | |
| | 20 | The extension is enabled only if both ends of the connection set this bit. |
| | 21 | |
| | 22 | The following proposed messages adhere to the syntax of messages found |
| | 23 | in v1.0 of the BitTorrent protocol. All integers are four bytes |
| | 24 | big-endian. All messages start with an integer message length. All |
| | 25 | messages but the Keep-Alive follow the message length with a single |
| | 26 | byte opcode and zero or more opcode-dependant arguments. |
| | 27 | |
| | 28 | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL |
| | 29 | NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and |
| | 30 | "OPTIONAL" in this document are to be interpreted as described in |
| | 31 | IETF `RFC 2119`_. |
| | 32 | |
| | 33 | Modifications to Semantics of Existing Messages |
| | 34 | =============================================== |
| | 35 | |
| | 36 | The Fast Extension modifies the semantics of the |
| | 37 | *Request*, *Choke*, *Unchoke*, and *Cancel* |
| | 38 | messages, and adds a *Reject Request.* Now, every request |
| | 39 | is guaranteed to result in EXACTLY ONE response |
| | 40 | which is either the corresponding reject or corresponding piece |
| | 41 | message. Even when a request is cancelled, the peer receiving |
| | 42 | the cancel should respond with either the corresponding reject or |
| | 43 | the corresponding piece: requests that are being processed are |
| | 44 | allowed to complete. |
| | 45 | |
| | 46 | Choke no longer implicitly rejects all pending requests, |
| | 47 | thus eliminating some race conditions which could cause pieces |
| | 48 | to be needlessly requested multiple times. |
| | 49 | |
| | 50 | Additionally, if a peer receives a piece that was never requested, |
| | 51 | the peer MUST close the connection. |
| | 52 | |
| | 53 | |
| | 54 | Have All/Have None |
| | 55 | ================== |
| | 56 | |
| | 57 | :: |
| | 58 | |
| | 59 | *Have All*: <len=0x0001> <op=0x0E> |
| | 60 | |
| | 61 | :: |
| | 62 | |
| | 63 | *Have None*: <len=0x0001><op=0x0F> |
| | 64 | |
| | 65 | *Have All* and *Have None* specify that the message sender |
| | 66 | has all or none of the pieces respectively. When present, *Have All* |
| | 67 | or *Have None* replace the *Have Bitfield.* Exactly one of *Have All*, |
| | 68 | *Have None*, or *Have Bitfield* MUST appear and only immediately after |
| | 69 | the handshake. The reason for these messages is to save bandwidth. Also |
| | 70 | slightly to remove the idiosyncrasy of sending no message when a peer |
| | 71 | has no pieces. |
| | 72 | |
| | 73 | When the fast extension is disabled, if a peer receives *Have All* or |
| | 74 | *Have None* then the peer MUST close the connection. |
| | 75 | |
| | 76 | |
| | 77 | Suggest Piece |
| 76 | | Every node maintains a routing table of known good nodes. The nodes in |
| 77 | | the routing table are used as starting points for queries in the |
| 78 | | DHT. Nodes from the routing table are returned in response to queries |
| 79 | | from other nodes. |
| 80 | | |
| 81 | | Not all nodes that we learn about are equal. Some are "good" and some |
| 82 | | are not. Many nodes using the DHT are able to send queries and receive |
| 83 | | responses, but are not able to respond to queries from other nodes. It |
| 84 | | is important that each node's routing table must contain only known |
| 85 | | good nodes. A good node is a node has responded to one of our queries |
| 86 | | within the last 15 minutes. A node is also good if it has ever |
| 87 | | responded to one of our queries and has sent us a query within the |
| 88 | | last 15 minutes. After 15 minutes of inactivity, a node becomes |
| 89 | | questionable. Nodes become bad when they fail to respond to multiple |
| 90 | | queries in a row. Nodes that we know are good are given priority over |
| 91 | | nodes with unknown status. |
| 92 | | |
| 93 | | The routing table covers the entire node ID space from 0 to |
| 94 | | 2\ :sup:`160`\ . The routing table is subdivided into "buckets" that |
| 95 | | each cover a portion of the space. An empty table has one bucket with |
| 96 | | an ID space range of min=0, max=2\ :sup:`160`\ . When a node with ID |
| 97 | | "N" is inserted into the table, it is placed within the bucket that |
| 98 | | has min <= N < max. An empty table has only one bucket so any |
| 99 | | node must fit within it. Each bucket can only hold K nodes, currently |
| 100 | | eight, before becoming "full." When a bucket is full of known good |
| 101 | | nodes, no more nodes may be added unless our own node ID falls within |
| 102 | | the range of the bucket. In that case, the bucket is replaced by two |
| 103 | | new buckets each with half the range of the old bucket and the nodes |
| 104 | | from the old bucket are distributed among the two new ones. For a new |
| 105 | | table with only one bucket, the full bucket is always split into two |
| 106 | | new buckets covering the ranges 0..2\ :sup:`159`\ and |
| 107 | | 2\ :sup:`159`\ ..2\ :sup:`160`\ . |
| 108 | | |
| 109 | | When the bucket is full of good nodes, the new node is simply |
| 110 | | discarded. If any nodes in the bucket are known to have become bad, |
| 111 | | then one is replaced by the new node. If there are any questionable |
| 112 | | nodes in the bucket have not been seen in the last 15 minutes, the |
| 113 | | least recently seen node is pinged. If the pinged node responds then |
| 114 | | the next least recently seen questionable node is pinged until one |
| 115 | | fails to respond or all of the nodes in the bucket are known to be |
| 116 | | good. If a node in the bucket fails to respond to a ping, it is |
| 117 | | suggested to try once more before discarding the node and replacing it |
| 118 | | with a new good node. In this way, the table fills with stable long |
| 119 | | running nodes. |
| 120 | | |
| 121 | | Each bucket should maintain a "last changed" property to |
| 122 | | indicate how "fresh" the contents are. When a node in a bucket is |
| 123 | | pinged and it responds, or a node is added to a bucket, or a node in a |
| 124 | | bucket is replaced with another node, the bucket's last changed |
| 125 | | property should be updated. Buckets that have not been changed in 15 |
| 126 | | minutes should be "refreshed." This is done by picking a random ID in |
| 127 | | the range of the bucket and performing a find_nodes search on it. Nodes |
| 128 | | that are able to receive queries from other nodes usually do not need |
| 129 | | to refresh buckets often. Nodes that are not able to receive queries |
| 130 | | from other nodes usually will need to refresh all buckets periodically |
| 131 | | to ensure there are good nodes in their table when the DHT is needed. |
| 132 | | |
| 133 | | Upon inserting the first node into its routing table and when starting |
| 134 | | up thereafter, the node should attempt to find the closest nodes in |
| 135 | | the DHT to itself. It does this by issuing find_node messages to |
| 136 | | closer and closer nodes until it cannot find any closer. The routing |
| 137 | | table should be saved between invocations of the client software. |
| 138 | | |
| 139 | | |
| 140 | | BitTorrent Protocol Extension |
| 141 | | ============================= |
| 142 | | |
| 143 | | The BitTorrent protocol has been extended to exchange node UDP port |
| 144 | | numbers between peers that are introduced by a tracker. In this way, |
| 145 | | clients can get their routing tables seeded automatically through the |
| 146 | | download of regular torrents. Newly installed clients who attempt to |
| 147 | | download a trackerless torrent on the first try will not have any |
| 148 | | nodes in their routing table and will need the contacts included in |
| 149 | | the torrent file. |
| 150 | | |
| 151 | | Peers supporting the DHT set the last bit of the 8-byte reserved flags |
| 152 | | exchanged in the BitTorrent protocol handshake. Peer receiving a |
| 153 | | handshake indicating the remote peer supports the DHT should send a |
| 154 | | PORT message. It begins with byte 0x09 and has a two byte payload |
| 155 | | containing the UDP port of the DHT node in network byte order. Peers |
| 156 | | that receive this message should attempt to ping the node on the |
| 157 | | received port and IP address of the remote peer. If a response to the |
| 158 | | ping is recieved, the node should attempt to insert the new contact |
| 159 | | information into their routing table according to the usual rules. |
| 160 | | |
| 161 | | |
| 162 | | Torrent File Extensions |
| 163 | | ======================= |
| 164 | | |
| 165 | | A trackerless torrent dictionary does not have an "announce" key. |
| 166 | | Instead, a trackerless torrent has a "nodes" key. This key should be |
| 167 | | set to the K closest nodes in the torrent generating client's routing |
| 168 | | table. Alternatively, the key could be set to a known good node such |
| 169 | | as one operated by the person generating the torrent. Please do not |
| 170 | | automatically add "router.bittorrent.com" to torrent files or |
| 171 | | automatically add this node to clients routing tables. |
| 172 | | |
| 173 | | :: |
| 174 | | |
| 175 | | nodes = [["<host>", <port>], ["<host>", <port>], ...] |
| 176 | | nodes = [["127.0.0.1", 6881], ["your.router.node", 4804]] |
| 177 | | |
| 178 | | |
| 179 | | |
| 180 | | KRPC Protocol |
| 181 | | ============= |
| 182 | | |
| 183 | | The KRPC protocol is a simple RPC mechanism consisting of bencoded |
| 184 | | dictionaries sent over UDP. A single query packet is sent out and a |
| 185 | | single packet is sent in response. There is no retry. There are three |
| 186 | | message types: query, response, and error. For the DHT protocol, there |
| 187 | | are four queries: ping, find_node, get_peers, and announce_peer. |
| 188 | | |
| 189 | | A KRPC message is a single dictionary with two keys common to |
| 190 | | every message and additional keys depending on the type of message. |
| 191 | | Every message has a key "t" with a string value representing a transaction |
| 192 | | ID. This transaction ID is generated by the querying node and is echoed |
| 193 | | in the response, so responses may be correlated with multiple queries |
| 194 | | to the same node. The transaction ID should be encoded as a short string |
| 195 | | of binary numbers, typically 2 characters are enough as they cover 2^16 |
| 196 | | outstanding queries. The other key contained in every KRPC message is "y" |
| 197 | | with a single character value describing the type of message. The value |
| 198 | | of the "y" key is one of "q" for query, "r" for response, or "e" for |
| 199 | | error. |
| 200 | | |
| 201 | | Contact Encoding |
| 202 | | ---------------- |
| 203 | | |
| 204 | | Contact information for peers is encoded as a 6-byte string. Also |
| 205 | | known as "Compact IP-address/port info" the 4-byte IP address is in |
| 206 | | network byte order with the 2 byte port in network byte order |
| 207 | | concatenated onto the end. |
| 208 | | |
| 209 | | Contact information for nodes is encoded as a 26-byte string. |
| 210 | | Also known as "Compact node info" the 20-byte Node ID in network byte |
| 211 | | order has the compact IP-address/port info concatenated to the end. |
| 212 | | |
| 213 | | Queries |
| 214 | | ------- |
| 215 | | |
| 216 | | Queries, or KRPC message dictionaries with a "y" value of "q", |
| 217 | | contain two additional keys; "q" and "a". Key "q" has a string value |
| 218 | | containing the method name of the query. Key "a" has a dictionary value |
| 219 | | containing named arguments to the query. |
| 220 | | |
| 221 | | Responses |
| 222 | | --------- |
| 223 | | |
| 224 | | Responses, or KRPC message dictionaries with a "y" value of "r", |
| 225 | | contain one additional key "r". The value of "r" is a dictionary |
| 226 | | containing named return values. Response messages are sent upon |
| 227 | | successful completion of a query. |
| 228 | | |
| 229 | | Errors |
| 230 | | ------ |
| 231 | | |
| 232 | | Errors, or KRPC message dictionaries with a "y" value of "e", |
| 233 | | contain one additional key "e". The value of "e" is a list. The first |
| 234 | | element is an integer representing the error code. The second element |
| 235 | | is a string containing the error message. Errors are sent when a query |
| 236 | | cannot be fulfilled. The following table describes the possible error |
| 237 | | codes: |
| 238 | | |
| 239 | | +----------+------------------------------------------+ |
| 240 | | | Code | Description | |
| 241 | | +----------+------------------------------------------+ |
| 242 | | | 201 | Generic Error | |
| 243 | | +----------+------------------------------------------+ |
| 244 | | | 202 | Server Error | |
| 245 | | +----------+------------------------------------------+ |
| 246 | | | 203 | Protocol Error, such as a malformed | |
| 247 | | | | packet, invalid arguments, or bad token | |
| 248 | | +----------+------------------------------------------+ |
| 249 | | | 204 | Method Unknown | |
| 250 | | +----------+------------------------------------------+ |
| 251 | | |
| 252 | | Example Error Packets: |
| 253 | | |
| 254 | | :: |
| 255 | | |
| 256 | | generic error = {"t":"aa", "y":"e", "e":[201, "A Generic Error Ocurred"]} |
| 257 | | bencoded = d1:eli201e23:A Generic Error Ocurrede1:t2:aa1:y1:ee |
| 258 | | |
| 259 | | |
| 260 | | DHT Queries |
| 261 | | =========== |
| 262 | | |
| 263 | | All queries have an "id" key and value containing the node ID of the |
| 264 | | querying node. All responses have an "id" key and value containing the |
| 265 | | node ID of the responding node. |
| 266 | | |
| 267 | | ping |
| 268 | | ---- |
| 269 | | |
| 270 | | The most basic query is a ping. "q" = "ping" A ping query has a |
| 271 | | single argument, "id" the value is a 20-byte string containing the |
| 272 | | senders node ID in network byte order. The appropriate response to a |
| 273 | | ping has a single key "id" containing the node ID of the responding |
| 274 | | node. |
| 275 | | |
| 276 | | :: |
| 277 | | |
| 278 | | arguments: {"id" : "<querying nodes id>"} |
| 279 | | |
| 280 | | response: {"id" : "<queried nodes id>"} |
| 281 | | |
| 282 | | |
| 283 | | Example Packets |
| 284 | | :: |
| 285 | | |
| 286 | | ping Query = {"t":"aa", "y":"q", "q":"ping", "a":{"id":"abcdefghij0123456789"}} |
| 287 | | bencoded = d1:ad2:id20:abcdefghij0123456789e1:q4:ping1:t2:aa1:y1:qe |
| 288 | | |
| 289 | | |
| 290 | | :: |
| 291 | | |
| 292 | | Response = {"t":"aa", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}} |
| 293 | | bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t2:aa1:y1:re |
| 294 | | |
| 295 | | |
| 296 | | find_node |
| 297 | | --------- |
| 298 | | |
| 299 | | Find node is used to find the contact information for a node given |
| 300 | | its ID. "q" == "find_node" A find_node query has two arguments, "id" |
| 301 | | containing the node ID of the querying node, and "target" containing |
| 302 | | the ID of the node sought by the queryer. When a node receives a |
| 303 | | find_node query, it should respond with a key "nodes" and value of a |
| 304 | | string containing the compact node info for the target node or the K |
| 305 | | (8) closest good nodes in its own routing table. |
| 306 | | |
| 307 | | :: |
| 308 | | |
| 309 | | arguments: {"id" : "<querying nodes id>", "target" : "<id of target node>"} |
| 310 | | |
| 311 | | response: {"id" : "<queried nodes id>", "nodes" : "<compact node info>"} |
| 312 | | |
| 313 | | |
| 314 | | Example Packets |
| 315 | | :: |
| 316 | | |
| 317 | | find_node Query = {"t":"aa", "y":"q", "q":"find_node", "a": {"id":"abcdefghij0123456789", "target":"mnopqrstuvwxyz123456"}} |
| 318 | | bencoded = d1:ad2:id20:abcdefghij01234567896:target20:mnopqrstuvwxyz123456e1:q9:find_node1:t2:aa1:y1:qe |
| 319 | | |
| 320 | | |
| 321 | | :: |
| 322 | | |
| 323 | | Response = {"t":"aa", "y":"r", "r": {"id":"0123456789abcdefghij", "nodes": "def456..."}} |
| 324 | | bencoded = d1:rd2:id20:0123456789abcdefghij5:nodes9:def456...e1:t2:aa1:y1:re |
| 325 | | |
| 326 | | |
| 327 | | get_peers |
| 328 | | --------- |
| 329 | | |
| 330 | | Get peers associated with a torrent infohash. "q" = "get_peers" A |
| 331 | | get_peers query has two arguments, "id" containing the node ID of the |
| 332 | | querying node, and "info_hash" containing the infohash of the torrent. |
| 333 | | If the queried node has peers for the infohash, they are returned in a |
| 334 | | key "values" as a list of strings. Each string containing "compact" format |
| 335 | | peer information for a single peer. If the queried node has no |
| 336 | | peers for the infohash, a key "nodes" is returned containing the K |
| 337 | | nodes in the queried nodes routing table closest to the infohash |
| 338 | | supplied in the query. In either case a "token" key is also included in |
| 339 | | the return value. The token value is a required argument for a future |
| 340 | | announce_peer query. The token value should be a short binary string. |
| 341 | | |
| 342 | | :: |
| 343 | | |
| 344 | | arguments: {"id" : "<querying nodes id>", "info_hash" : "<20-byte infohash of target torrent>"} |
| 345 | | |
| 346 | | response: {"id" : "<queried nodes id>", "token" :"<opaque write token>", "values" : ["<peer 1 info string>", "<peer 2 info string>"]} |
| 347 | | |
| 348 | | or: {"id" : "<queried nodes id>", "token" :"<opaque write token>", "nodes" : "<compact node info>"} |
| 349 | | |
| 350 | | |
| 351 | | Example Packets: |
| 352 | | :: |
| 353 | | |
| 354 | | get_peers Query = {"t":"aa", "y":"q", "q":"get_peers", "a": {"id":"abcdefghij0123456789", "info_hash":"mnopqrstuvwxyz123456"}} |
| 355 | | bencoded = d1:ad2:id20:abcdefghij01234567899:info_hash20:mnopqrstuvwxyz123456e1:q9:get_peers1:t2:aa1:y1:qe |
| 356 | | |
| 357 | | |
| 358 | | :: |
| 359 | | |
| 360 | | Response with peers = {"t":"aa", "y":"r", "r": {"id":"abcdefghij0123456789", "token":"aoeusnth", "values": ["axje.u", "idhtnm"]}} |
| 361 | | bencoded = d1:rd2:id20:abcdefghij01234567895:token8:aoeusnth6:valuesl6:axje.u6:idhtnmee1:t2:aa1:y1:re |
| 362 | | |
| 363 | | |
| 364 | | :: |
| 365 | | |
| 366 | | Response with closest nodes = {"t":"aa", "y":"r", "r": {"id":"abcdefghij0123456789", "token":"aoeusnth", "nodes": "def456..."}} |
| 367 | | bencoded = d1:rd2:id20:abcdefghij01234567895:nodes9:def456...5:token8:aoeusnthe1:t2:aa1:y1:re |
| 368 | | |
| 369 | | |
| 370 | | announce_peer |
| 371 | | ------------- |
| 372 | | |
| 373 | | Announce that the peer, controlling the querying node, is downloading |
| 374 | | a torrent on a port. announce_peer has four arguments: "id" containing the node ID of the |
| 375 | | querying node, "info_hash" containing the infohash of the torrent, |
| 376 | | "port" containing the port as an integer, and the "token" received in |
| 377 | | response to a previous get_peers query. The queried node must verify |
| 378 | | that the token was previously sent to the same IP address as the |
| 379 | | querying node. Then the queried node should store the IP address of the |
| 380 | | querying node and the supplied port number under the infohash in its |
| 381 | | store of peer contact information. |
| 382 | | |
| 383 | | :: |
| 384 | | |
| 385 | | arguments: {"id" : "<querying nodes id>", "info_hash" : "<20-byte infohash of target torrent>", "port" : <port number>, "token" : "<opaque token>"} |
| 386 | | |
| 387 | | response: {"id" : "<queried nodes id>"} |
| 388 | | |
| 389 | | |
| 390 | | Example Packets: |
| 391 | | :: |
| 392 | | |
| 393 | | announce_peers Query = {"t":"aa", "y":"q", "q":"announce_peer", "a": {"id":"abcdefghij0123456789", "info_hash":"mnopqrstuvwxyz123456", "port": 6881, "token": "aoeusnth"}} |
| 394 | | bencoded = d1:ad2:id20:abcdefghij01234567899:info_hash20:<br /> |
| 395 | | mnopqrstuvwxyz1234564:porti6881e5:token8:aoeusnthe1:q13:announce_peer1:t2:aa1:y1:qe |
| 396 | | |
| 397 | | |
| 398 | | :: |
| 399 | | |
| 400 | | Response = {"t":"aa", "y":"r", "r": {"id":"mnopqrstuvwxyz123456"}} |
| 401 | | bencoded = d1:rd2:id20:mnopqrstuvwxyz123456e1:t2:aa1:y1:re |
| 402 | | |
| 403 | | References |
| 404 | | ========== |
| 405 | | |
| 406 | | .. [#Kademlia] Peter Maymounkov, David Mazieres, "Kademlia: A Peer-to-peer Information System Based on the XOR Metric", *IPTPS 2002*. http://www.cs.rice.edu/Conferences/IPTPS02/109.pdf |
| 407 | | |
| 408 | | .. [#entropy] Use SHA1 and plenty of entropy to ensure a unique ID. |
| | 80 | :: |
| | 81 | |
| | 82 | *Suggest Piece*: <len=0x0005><op=0x0D><index> |
| | 83 | |
| | 84 | *Suggest Piece* is an advisory message meaning "you might like to |
| | 85 | download this piece." The intended usage is for 'super-seeding' |
| | 86 | without throughput reduction, to avoid redundant downloads, and so that |
| | 87 | a seed which is disk I/O bound can upload continguous or identical |
| | 88 | pieces to avoid excessive disk seeks. In all cases, the seed SHOULD |
| | 89 | operate to maintain a roughly equal number of copies of each piece in |
| | 90 | the network. A peer MAY send more than one *suggest piece* message at |
| | 91 | any given time. A peer receiving multiple *suggest piece* messages |
| | 92 | MAY interpret this as meaning that all of the suggested pieces |
| | 93 | are equally appropriate. |
| | 94 | |
| | 95 | When the fast extension is disabled, if a peer receives a |
| | 96 | *Suggest Piece* message, the peer MUST close the connection. |
| | 97 | |
| | 98 | |
| | 99 | Reject Request |
| | 100 | ============== |
| | 101 | |
| | 102 | :: |
| | 103 | |
| | 104 | *Reject Request*: <len=0x000D><op=0x10><index><begin><offset> |
| | 105 | |
| | 106 | *Reject Request* notifies a requesting peer that its request will not be satisfied. |
| | 107 | |
| | 108 | If the fast extension is disabled and a peer receives a reject |
| | 109 | request then the peer MUST close the connection. |
| | 110 | |
| | 111 | When the fast extension is enabled: |
| | 112 | |
| | 113 | - If a peer receives a reject for a request that was never sent then |
| | 114 | the peer SHOULD close the connection. |
| | 115 | |
| | 116 | - If a peer sends a choke, it MUST reject all requests from the peer |
| | 117 | to whom the choke was sent except it SHOULD NOT reject requests for |
| | 118 | pieces that are in the *allowed fast set.* A peer SHOULD choke first |
| | 119 | and then reject requests so that the peer receiving the choke does not |
| | 120 | re-request the pieces. |
| | 121 | |
| | 122 | - If a peer receives a request from a peer its choking, the peer |
| | 123 | receiving the request SHOULD send a reject unless the piece is in the |
| | 124 | *allowed fast set.* |
| | 125 | |
| | 126 | - If a peer receives an excessive number of requests from a peer it is |
| | 127 | choking, the peer receiving the requests MAY close the connection |
| | 128 | rather than reject the request. However, consider that it can take |
| | 129 | several seconds for buffers to drain and messages to propagate once a |
| | 130 | peer is choked. |
| | 131 | |
| | 132 | Allowed Fast |
| | 133 | ============ |
| | 134 | |
| | 135 | :: |
| | 136 | |
| | 137 | * Allowed Fast: <len=0x0005><op=0x11><index>* |
| | 138 | |
| | 139 | With the BitTorrent protocol specified in `BEP 0002`_, new peers take |
| | 140 | several minutes to ramp up before they can effectively engage in |
| | 141 | BitTorrent's tit-for-tat. The reason is simple: starting peers have |
| | 142 | few pieces to trade. |
| | 143 | |
| | 144 | *Allowed Fast* is an advisory message which means "if you ask for this |
| | 145 | piece, I'll give it to you even if you're choked." *Allowed Fast* thus |
| | 146 | shortens the awkward stage during which the peer obtains occasional |
| | 147 | optimistic unchokes but cannot sufficiently reciprocate to remain |
| | 148 | unchoked. |
| | 149 | |
| | 150 | The pieces that can be downloaded when choked constitute a peer's |
| | 151 | *allowed fast set.* The set is generated using a canonical algorithm |
| | 152 | that produces piece indices unique to the message receiver so that if |
| | 153 | two peers offer *k* pieces fast it will be the same *k*, and if one |
| | 154 | offers *k+1* it will be the same *k* plus one more. *k* should be |
| | 155 | small enough to avoid abuse, but large enough to ramp up |
| | 156 | tit-for-tat. We currently set *k* to 10, but peers are free to change |
| | 157 | this number, e.g., to suit load. |
| | 158 | |
| | 159 | The message sender MAY list pieces that the message sender does not |
| | 160 | have. The receiver MUST NOT interpret an Allowed Fast message as |
| | 161 | meaning that the message sender has the piece. This allows peers to |
| | 162 | generate and communicate allowed fast sets at the beginning of a |
| | 163 | connection. However, a peer MAY send Allowed Fast messages at any |
| | 164 | time. |
| | 165 | |
| | 166 | A peer SHOULD send Allowed Fast messages to any starting peer unless |
| | 167 | the local peer lacks sufficient resources. A peer MAY reject requests |
| | 168 | for already Allowed Fast pieces if the local peer lacks sufficient |
| | 169 | resources, if the requested piece has already been sent to the |
| | 170 | requesting peer, or if the requesting peer is not a starting peer. Our |
| | 171 | current implementation rejects requests for Allowed Fast messages |
| | 172 | whenever the requesting peer has more than * k * pieces. |
| | 173 | |
| | 174 | When the fast extension is disabled, if a peer recieves an Allowed |
| | 175 | Fast message then the peer MUST close the connection. |
| | 176 | |
| | 177 | Allowed Fast Set Generation |
| | 178 | =========================== |
| | 179 | |
| | 180 | The canonical algorithm for computing a peer *P'*s *allowed fast set* |
| | 181 | follows. All integers in this pseudocode are four bytes represented |
| | 182 | in network (big-endian) byte order. *[a:b]* denotes the sequence of |
| | 183 | consecutive bytes from *a* to *b* excluding *b*, i.e., *(a, a+1, |
| | 184 | a+2,..., b-1)*. *x[a:b]* denotes a subsequence of elements in an array |
| | 185 | *x* starting from index *a* to but not including index *b*. |
| | 186 | |
| | 187 | Let *ip* denote *P'*s IPv4 address. We currently have no |
| | 188 | provisions for IPv6. If a peer is behind a Network Address Translator |
| | 189 | (NAT) then *ip* should be the externally facing IP address of the |
| | 190 | NAT. Since the node sending the *Allowed Fast* messages computes |
| | 191 | the set, the correct *ip* is usually the *ip* address on the other |
| | 192 | end of the connection. The host computing the set MAY use the *ip* |
| | 193 | address on the other end of the connection regardless |
| | 194 | |
| | 195 | Let *sz* denote the number of pieces in the torrent. |
| | 196 | |
| | 197 | Let *a* denote the allowed fast set. |
| | 198 | |
| | 199 | Let *k* denote the final number of pieces in the allowed fast set. |
| | 200 | |
| | 201 | |
| | 202 | :: |
| | 203 | |
| | 204 | x = 0xFFFFFF00 & ip (1) |
| | 205 | x.append(infohash) (2) |
| | 206 | while |a| < k: |
| | 207 | x = SHA1(x) (3) |
| | 208 | for i in [0:5] and |a| < k: (4) |
| | 209 | j = i*4 (5) |
| | 210 | y = x[j:j+4] (6) |
| | 211 | index = y % sz (7) |
| | 212 | if index not in a: (8) |
| | 213 | add index to a (9) |
| | 214 | |
| | 215 | Step (1) selects the most significant octets in peer *P'*s |
| | 216 | ip address. We do this to prevent a user that obtains more than one |
| | 217 | IP address on the same network from obtaining more than one |
| | 218 | *allowed fast set.* Use of three bytes is heuristic and |
| | 219 | historical. |
| | 220 | |
| | 221 | Step (3) generates a 20-byte random number on each call. By |
| | 222 | performing a SHA-1 hash on the previous iteration's hash, we can |
| | 223 | generate an arbitrarily long pseudorandom sequence. |
| | 224 | |
| | 225 | Steps (4) through (9) partition the 20-byte hash into piece indices |
| | 226 | and add them to the allowed fast set. |
| | 227 | |
| | 228 | Example Implementation |
| | 229 | ====================== |
| | 230 | |
| | 231 | The following C++ implementation was provided by CacheLogic: |
| | 232 | |
| | 233 | |
| | 234 | :: |
| | 235 | |
| | 236 | void generate_fast_set( |
| | 237 | uint32 k, // number of pieces in set |
| | 238 | uint32 sz, // number of pieces in torrent |
| | 239 | const char infohash[20], // infohash of torrent |
| | 240 | uint32 ip, // in host byte order, ie localhost is 0x7f000001 |
| | 241 | std:: |
| | 242 | vector<uint32> &a) // generated set of piece indices |
| | 243 | { |
| | 244 | a.clear(); |
| | 245 | std::string x; |
| | 246 | char buf[4]; |
| | 247 | *(uint32*)buf = htonl(ip & 0xffffff00); |
| | 248 | x.assign(buf, 4); |
| | 249 | x.append(infohash, 20); // (3) |
| | 250 | while (a.size()<k) { |
| | 251 | x = SHA1(x); // (4) |
| | 252 | for ( int i=0 ; i<5 && a.size()<k; i++ ) { // (5) |
| | 253 | int j = i*4; // (6) |
| | 254 | uint32 y = ntohl(*(uint32*)(x.data()+j)); // (7) |
| | 255 | uint32 index = y % sz; // (8) |
| | 256 | if (std::find(a.begin(), a.end(), index)==a.end()) { // (9) |
| | 257 | a.push_back(index); // (10) |
| | 258 | } |
| | 259 | } |
| | 260 | } |
| | 261 | } |
| | 262 | |
| | 263 | Example results generated by this function: |
| | 264 | |
| | 265 | |
| | 266 | :: |
| | 267 | |
| | 268 | 7 piece allowed fast set for torrent with 1313 pieces and hex infohash |
| | 269 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa for node with IP 80.4.4.200: |
| | 270 | 1059,431,808,1217,287,376,1188 |
| | 271 | 9 piece allowed fast set for torrent with 1313 pieces and hex infohash |
| | 272 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa for node with IP 80.4.4.200: |
| | 273 | 1059,431,808,1217,287,376,1188,353,508 |
| | 274 | |
| | 275 | .. _`RFC 2119`: http://www.ietf.org/rfc/rfc2119.txt |
| | 276 | .. _`BEP 0002`: http://www.bittorrent.org/beps/bep_0002.html |
| | 277 | |