Changeset 11165
- Timestamp:
- 06/25/09 11:54:14 (9 months ago)
- Location:
- dotorg/trunk/html/beps
- Files:
-
- 2 modified
-
bep_0003.html (modified) (5 diffs)
-
bep_0029.html (modified) (20 diffs)
Legend:
- Unmodified
- Added
- Removed
-
dotorg/trunk/html/beps/bep_0003.html
r11097 r11165 41 41 <tr class="field"><th class="field-name">Version:</th><td class="field-body">11031</td> 42 42 </tr> 43 <tr class="field"><th class="field-name">Last-Modified:</th><td class="field-body"><a class="reference external" href="http://bittorrent.org/trac/browser/dotorg/trunk/html/beps/bep_0003.rst">2008-02-28 16:43:58 -0800 (Thu, 28 Feb 2008)</a></td>43 <tr class="field"><th class="field-name">Last-Modified:</th><td class="field-body"><a class="reference" href="http://bittorrent.org/trac/browser/dotorg/trunk/html/beps/bep_0003.rst">2008-02-28 16:43:58 -0800 (Thu, 28 Feb 2008)</a></td> 44 44 </tr> 45 45 <tr class="field"><th class="field-name">Author:</th><td class="field-body">Bram Cohen <bram at bittorrent.com></td> … … 51 51 <tr class="field"><th class="field-name">Created:</th><td class="field-body">10-Jan-2008</td> 52 52 </tr> 53 <tr class="field"><th class="field-name">Post-History:</th><td class="field-body"> </td>53 <tr class="field"><th class="field-name">Post-History:</th><td class="field-body">24-Jun-2009, clarified the encoding of strings in torrent files</td> 54 54 </tr> 55 55 </tbody> … … 59 59 <p class="topic-title first">Contents</p> 60 60 <ul class="simple"> 61 <li><a class="reference internal" href="#a-bittorrent-file-distribution-consists-of-these-entities" id="id2">A BitTorrent file distribution consists of these entities:</a></li>62 <li><a class="reference internal" href="#to-start-serving-a-host-goes-through-the-following-steps" id="id3">To start serving, a host goes through the following steps:</a></li>63 <li><a class="reference internal" href="#to-start-downloading-a-user-does-the-following" id="id4">To start downloading, a user does the following:</a></li>64 <li><a class="reference internal" href="#the-connectivity-is-as-follows" id="id5">The connectivity is as follows:</a></li>65 <li><a class="reference internal" href="#metainfo-files-are-bencoded-dictionaries-with-the-following-keys" id="id6">Metainfo files are bencoded dictionaries with the following keys:</a></li>66 <li><a class="reference internal" href="#tracker-get-requests-have-the-following-keys" id="id7">Tracker GET requests have the following keys:</a></li>67 <li><a class="reference internal" href="#all-non-keepalive-messages-start-with-a-single-byte-which-gives-their-type" id="id8">All non-keepalive messages start with a single byte which gives their type.</a></li>68 <li><a class="reference internal" href="#the-possible-values-are" id="id9">The possible values are:</a></li>69 <li><a class="reference internal" href="#copyright" id="id10">Copyright</a></li>61 <li><a class="reference" href="#a-bittorrent-file-distribution-consists-of-these-entities" id="id2">A BitTorrent file distribution consists of these entities:</a></li> 62 <li><a class="reference" href="#to-start-serving-a-host-goes-through-the-following-steps" id="id3">To start serving, a host goes through the following steps:</a></li> 63 <li><a class="reference" href="#to-start-downloading-a-user-does-the-following" id="id4">To start downloading, a user does the following:</a></li> 64 <li><a class="reference" href="#the-connectivity-is-as-follows" id="id5">The connectivity is as follows:</a></li> 65 <li><a class="reference" href="#metainfo-files-are-bencoded-dictionaries-with-the-following-keys" id="id6">Metainfo files are bencoded dictionaries with the following keys:</a></li> 66 <li><a class="reference" href="#tracker-get-requests-have-the-following-keys" id="id7">Tracker GET requests have the following keys:</a></li> 67 <li><a class="reference" href="#all-non-keepalive-messages-start-with-a-single-byte-which-gives-their-type" id="id8">All non-keepalive messages start with a single byte which gives their type.</a></li> 68 <li><a class="reference" href="#the-possible-values-are" id="id9">The possible values are:</a></li> 69 <li><a class="reference" href="#copyright" id="id10">Copyright</a></li> 70 70 </ul> 71 71 </div> … … 139 139 <dt>info</dt> 140 140 <dd><p class="first">This maps to a dictionary, with keys described below.</p> 141 <p>The <tt class="docutils literal"><span class="pre">name</span></tt> key maps to a string which is the suggested name142 to save the file (or directory) as. It is purely advisory.</p>141 <p>The <tt class="docutils literal"><span class="pre">name</span></tt> key maps to a UTF-8 encoded string which is the 142 suggested name to save the file (or directory) as. It is purely advisory.</p> 143 143 <p><tt class="docutils literal"><span class="pre">piece</span> <span class="pre">length</span></tt> maps to the number of bytes in each piece 144 144 the file is split into. For the purposes of transfer, files are … … 163 163 the following keys:</p> 164 164 <p><tt class="docutils literal"><span class="pre">length</span></tt> - The length of the file, in bytes.</p> 165 <p><tt class="docutils literal"><span class="pre">path</span></tt> - A list of strings corresponding to subdirectory165 <p><tt class="docutils literal"><span class="pre">path</span></tt> - A list of UTF-8 encoded strings corresponding to subdirectory 166 166 names, the last of which is the actual file name (a zero length list 167 167 is an error case).</p> 168 <p class="last">In the single file case, the name key is the name of a file, in the168 <p>In the single file case, the name key is the name of a file, in the 169 169 muliple file case, it's the name of a directory.</p> 170 <p class="last">All strings in a .torrent file that contains text must be UTF-8 171 encoded.</p> 170 172 </dd> 171 173 </dl> -
dotorg/trunk/html/beps/bep_0029.html
r11160 r11165 39 39 <tr class="field"><th class="field-name">Title:</th><td class="field-body">uTorrent transport protocol</td> 40 40 </tr> 41 <tr class="field"><th class="field-name">Version:</th><td class="field-body">111 59</td>42 </tr> 43 <tr class="field"><th class="field-name">Last-Modified:</th><td class="field-body"><a class="reference" href="http://bittorrent.org/trac/browser/dotorg/trunk/html/beps/bep_0029.rst">2009-06-2 3 13:45:47 -0700 (Tue, 23Jun 2009)</a></td>41 <tr class="field"><th class="field-name">Version:</th><td class="field-body">11164</td> 42 </tr> 43 <tr class="field"><th class="field-name">Last-Modified:</th><td class="field-body"><a class="reference" href="http://bittorrent.org/trac/browser/dotorg/trunk/html/beps/bep_0029.rst">2009-06-25 11:53:53 -0700 (Thu, 25 Jun 2009)</a></td> 44 44 </tr> 45 45 <tr class="field"><th class="field-name">Author:</th><td class="field-body">Arvid Norberg <arvid at bittorrent.com></td> … … 68 68 <li><a class="reference" href="#version" id="id7">version</a></li> 69 69 <li><a class="reference" href="#connection-id" id="id8">connection_id</a></li> 70 <li><a class="reference" href="#timestamp- seconds-timestamp-microseconds" id="id9">timestamp_seconds,timestamp_microseconds</a></li>70 <li><a class="reference" href="#timestamp-microseconds" id="id9">timestamp_microseconds</a></li> 71 71 <li><a class="reference" href="#timestamp-difference-microseconds" id="id10">timestamp_difference_microseconds</a></li> 72 72 <li><a class="reference" href="#wnd-size" id="id11">wnd_size</a></li> … … 111 111 <p>The fact that BitTorrent uses multiple TCP connections gives it an 112 112 unfair advantage when competing with other services for bandwidth, 113 which exaggerates the effect of BitTorrent filling the upload pipe.</p> 113 which exaggerates the effect of BitTorrent filling the upload pipe. 114 The reason for this is because TCP distributes the available bandwidth 115 evenly across connections, and the more connections one application 116 uses, the larger share of the bandwidth it gets.</p> 114 117 <p>The traditional solution to this problem is to cap the upload rate 115 of the bittorrent client to 80% of the uplink capacity. 80% leaves118 of the BitTorrent client to 80% of the up-link capacity. 80% leaves 116 119 some head room for interactive traffic.</p> 117 120 <p>The main drawbacks with this solution are:</p> 118 121 <ol class="arabic simple"> 119 <li>The user needs to configure his/her bittorrent client, it won't122 <li>The user needs to configure his/her BitTorrent client, it won't 120 123 work out-of-the-box.</li> 121 124 <li>The user needs to know his/her internet connection's upload … … 123 126 may connect to a large number of different networks.</li> 124 127 <li>The headroom of 20% is arbitrary and wastes bandwidth. Whenever 125 there is no interactive traffic competing with bittorrent, the128 there is no interactive traffic competing with BitTorrent, the 126 129 extra 20% are wasted. Whenever there is competing interactive 127 130 traffic, it cannot use more than 20% of the capacity.</li> … … 137 140 <p>This document assumes some knowledge of how TCP and window based 138 141 congestion control works.</p> 139 <p>The main addition compared to TCP is the delay based congestion 142 <p>uTP is a transport protocol layered on top of UDP. As such, it must 143 (and has the ability to) implement its own congestion control.</p> 144 <p>The main difference compared to TCP is the delay based congestion 140 145 control. See the <a class="reference" href="#congestion-control">congestion control</a> section.</p> 141 146 <p>Like TCP, uTP uses window based congestion control. Each socket … … 186 191 <div class="section" id="version"> 187 192 <h3>version</h3> 188 <p>This is the prot col version. The current version is 1.</p>193 <p>This is the protocol version. The current version is 1.</p> 189 194 </div> 190 195 <div class="section" id="connection-id"> … … 195 200 connection decides which ID to use, and the return path has the same ID + 1.</p> 196 201 </div> 197 <div class="section" id="timestamp-seconds-timestamp-microseconds"> 198 <h3>timestamp_seconds, timestamp_microseconds</h3> 199 <p>This is the 'seconds' and 'microseconds' parts of the timestamp of when this 200 packet was sent. This is set using gettimeofday() on posix and QueryPerformanceTimer() 201 on windows. The higher resultion this timstamp has, the better.</p> 202 <div class="section" id="timestamp-microseconds"> 203 <h3>timestamp_microseconds</h3> 204 <p>This is the 'microseconds' parts of the timestamp of when this packet was sent. 205 This is set using gettimeofday() on posix and QueryPerformanceTimer() 206 on windows. The higher resolution this timestamp has, the better. The closer 207 to the actual transmit time it is set, the better.</p> 202 208 </div> 203 209 <div class="section" id="timestamp-difference-microseconds"> … … 238 244 <div class="section" id="selective-ack"> 239 245 <h4>Selective ACK</h4> 240 <p>Selective ack is an extension that can selectively ACK packets non-sequentially. 241 Its payload is a bitmask of 32 bits, representing the first 32 packets in the 242 send window. A set bit specifies that packet has been received, a cleared bit 246 <p>Selective ACK is an extension that can selectively ACK packets non-sequentially. 247 Its payload is a bitmask of at least 32 bits, in multiples of 32 bits. Each bit 248 represents one packet in the send window. Bits that are outside of the send window 249 are ignored. A set bit specifies that packet has been received, a cleared bit 243 250 specifies that the packet has not been received. The header looks like this:</p> 244 251 <pre class="literal-block"> … … 250 257 +---------------+---------------+ 251 258 </pre> 252 <p>The selective ack is only sent when at least one sequence number was skipped in 259 <p>Note that the len field of extensions refer to bytes, which in this extension 260 must be at least 4, and in multiples of 4.</p> 261 <p>The selective ACK is only sent when at least one sequence number was skipped in 253 262 the received stream. The first bit in the mask therefore represents ack_nr + 2. 254 ack_nr + 1 is assumed to have been dropped or be missing when this packet was sent.263 ack_nr + 1 is assumed to have been dropped or be missing when this packet was sent. 255 264 A set bit represents a packet that has been received, a cleared bit represents 256 265 a packet that has not yet been received.</p> 257 <p>The bitmask has reverse byte order. The first byte represents packets [ack_nr + 2,266 <p>The bitmask has reverse byte order. The first byte represents packets [ack_nr + 2, 258 267 ack_nr + 2 + 7] in reverse order. The least significant bit in the byte represents 259 268 ack_nr + 2, the most significant bit in the byte represents ack_nr + 2 + 7. The 260 269 next byte in the mask represents [ack_nr + 2 + 8, ack_nr + 2 + 15] in reverse order, 261 270 and so on. The bitmask is not limited to 32 bits but can be of any size.</p> 271 <p>Here is the layout of a bitmask representing the first 32 packet acks 272 represented in a selective ACK bitfield:</p> 273 <pre class="literal-block"> 274 0 8 16 275 +---------------+---------------+---------------+---------------+ 276 | 9 8 ... 3 2 | 17 ... 10 | 25 ... 18 | 33 ... 26 | 277 +---------------+---------------+---------------+---------------+ 278 </pre> 279 <p>The number in the diagram maps the bit in the bitmask to the offset to add to 280 <tt class="docutils literal"><span class="pre">ack_nr</span></tt> in order to calculate the sequence number that the bit is ACKing.</p> 262 281 </div> 263 282 <div class="section" id="extension-bits"> … … 295 314 the ST_FIN packet.</dd> 296 315 <dt>ST_STATE = 2</dt> 297 <dd>State packet. Used to transmit an ACK with no data.</dd> 316 <dd>State packet. Used to transmit an ACK with no data. Packets that don't 317 include any payload do not increase the <tt class="docutils literal"><span class="pre">seq_nr</span></tt>.</dd> 298 318 <dt>ST_RESET = 3</dt> 299 319 <dd>Terminate connection forcefully. Similar to TCP RST flag. The remote … … 304 324 The sequence number is initialized to 1. The connection ID is initialized 305 325 to a random number. The syn packet is special, all subsequent packets sent 306 on this connection (except for re sends of the ST_SYN) are sent with the326 on this connection (except for re-sends of the ST_SYN) are sent with the 307 327 connection ID + 1. The connection ID is what the other end is expected to 308 328 use in its responses.</p> … … 310 330 ID in the packet header. The send ID for the socket should be initialized 311 331 to the ID + 1. The sequence number for the return channel is initialized 312 to a random number. The other end expects an ST_STATE packet (only an ack)332 to a random number. The other end expects an ST_STATE packet (only an ACK) 313 333 in response.</p> 314 334 </dd> … … 329 349 <div class="section" id="connection-setup"> 330 350 <h2>connection setup</h2> 331 <p><em>describe the initial connection setup sequence</em></p> 351 <p>Here is a diagram illustrating the exchanges and states to initiate 352 a connection. The c.* refers to a state in the socket itself, pkt.* 353 refers to a field in the packet header.</p> 354 <pre class="literal-block"> 355 initiating endpoint accepting endpoint 356 357 | c.state = CS_SYN_SENT | 358 | c.seq_nr = 1 | 359 | c.conn_id_recv = rand() | 360 | c.conn_id_send = c.conn_id_recv + 1 | 361 | | 362 | | 363 | ST_SYN | 364 | seq_nr=c.seq_nr++ | 365 | ack_nr=* | 366 | conn_id=c.rcv_conn_id | 367 | >-------------------------------------------> | 368 | c.receive_conn_id = pkt.conn_id+1 | 369 | c.send_conn_id = pkt.conn_id | 370 | c.seq_nr = rand() | 371 | c.ack_nr = pkt.seq_nr | 372 | c.state = CS_CONNECTED | 373 | | 374 | | 375 | | 376 | | 377 | ST_STATE | 378 | seq_nr=c.seq_nr++ | 379 | ack_nr=c.ack_nr | 380 | conn_id=c.send_conn_id | 381 | <------------------------------------------< | 382 | c.state = CS_CONNECTED | 383 | c.ack_nr = pkt.seq_nr | 384 | | 385 | | connection established 386 .. ..|.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..|.. .. 387 | | 388 | | 389 | ST_DATA | 390 | seq_nr=c.seq_nr++ | 391 | ack_nr=c.ack_nr | 392 | conn_id=c.conn_id_send | 393 | >-------------------------------------------> | 394 | c.ack_nr = pkt.seq_nr | 395 | | 396 | | 397 | | 398 | ST_DATA | 399 | seq_nr=c.seq_nr++ | 400 | ack_nr=c.ack_nr | 401 | conn_id=c.send_conn_id | 402 | <------------------------------------------< | 403 | c.ack_nr = pkt.seq_nr | 404 | | 405 | | 406 V V 407 </pre> 408 <p>Connections are identified by their <tt class="docutils literal"><span class="pre">conn_id</span></tt> header. If the connection ID of a new 409 connection collides with an existing connection, the connection attempt will fails, since 410 the ST_SYN packet will be unexpected in the existing stream, and ignored.</p> 332 411 </div> 333 412 <div class="section" id="packet-loss"> … … 335 414 <p>If the packet with sequence number (<tt class="docutils literal"><span class="pre">seq_nr</span></tt> - <tt class="docutils literal"><span class="pre">cur_window</span></tt>) has not been acked 336 415 (this is the oldest packet in the send buffer, and the next one expected to be acked) 337 has not been acked, but 3 or more packets have been acked pas sedit (through Selective416 has not been acked, but 3 or more packets have been acked past it (through Selective 338 417 ACK), the packet is assumed to have been lost. Similarly, when receiving 3 duplicate 339 418 acks, <tt class="docutils literal"><span class="pre">ack_nr</span></tt> + 1 is assumed to have been lost (if a packet with that sequence number 340 419 has been sent).</p> 341 <p>When a packet is lost, the <tt class="docutils literal"><span class="pre">max_window</span></tt> is multiplied by 0.78.</p> 420 <p>When a packet is lost, the <tt class="docutils literal"><span class="pre">max_window</span></tt> is multiplied by 0.78. TCP multiplies by 421 0.5, but since this is a much less likely event in uTP, and since the uTP ramp-up 422 is slower than TCP, this is a reasonable optimization.</p> 342 423 </div> 343 424 <div class="section" id="timeouts"> … … 348 429 last_ack_nr here is the last ack_nr received on the socket before the current packet, 349 430 and ack_nr is the field in the currently received packet.</p> 350 <p>The <tt class="docutils literal"><span class="pre">rtt</span></tt> and <tt class="docutils literal"><span class="pre">rtt_var</span></tt> is only updated for packets that w here sent only once.431 <p>The <tt class="docutils literal"><span class="pre">rtt</span></tt> and <tt class="docutils literal"><span class="pre">rtt_var</span></tt> is only updated for packets that were sent only once. 351 432 This avoids problems with figuring out which packet was acked, the first or the 352 433 second one.</p> … … 372 453 goes down to zero.</p> 373 454 <p>The initial timeout is set to 1000 milliseconds, and later updated according to 374 the formula above. For e ach packet that times out in a row, the timeout is375 doubled.</p>455 the formula above. For every packet consecutive subsequent packet that times out, 456 the timeout is doubled.</p> 376 457 </div> 377 458 <div class="section" id="packet-sizes"> … … 379 460 <p>In order to have as little impact as possible on slow congested links, uTP adjusts 380 461 its packet size down to as small as 150 bytes per packet. Using packets that small 381 has the benefit of not clogging up a slow uplink, with long serialization delay.462 has the benefit of not clogging a slow up-link, with long serialization delay. 382 463 The cost of using packets that small is that the overhead from the packet headers 383 464 become significant. At high rates, large packet sizes are used, at slow rates, … … 388 469 <p>The overall goal of the uTP congestion control is to use one way buffer delay as the 389 470 main congestion measurement, as well as packet loss, like TCP. The point is to avoid 390 running atfull send buffers whenever data is being sent. This is specifically a471 running with full send buffers whenever data is being sent. This is specifically a 391 472 problem for DSL/Cable modems, where the send buffer in the modem often has room for 392 473 multiple seconds worth of data. The ideal buffer utilization for uTP (or any background … … 401 482 original sender of the packet (timestamp_difference_microseconds). This value is not meaningful 402 483 as an absolute value. The clocks in the machines are most likely not synchronized, 403 especially not down to mic orsecond resolution, and the time the packet is in transit is484 especially not down to microsecond resolution, and the time the packet is in transit is 404 485 also included in the difference of these timestamps. However, the value is useful in 405 486 comparison to previous values.</p> 406 487 <p>Each socket keeps a sliding minimum of the lowest value for the last two minutes. This value 407 is called <em>base_delay</em>, and is used as a baseline, the min umum delay between the hosts.488 is called <em>base_delay</em>, and is used as a baseline, the minimum delay between the hosts. 408 489 When subtracting the base_delay from the timestamp difference in each packet you get a 409 490 measurement of the current buffering delay on the socket. This measurement is called <em>our_delay</em>. 410 491 It has a lot of noise it it, but is used as the driver to determine whether to increase or 411 492 decrease the send window (which controls the send rate).</p> 412 <p>The <em>CCONTROL_TARGET</em> is the buffering delay that the uTP accepts on the up link. Currently the493 <p>The <em>CCONTROL_TARGET</em> is the buffering delay that the uTP accepts on the up-link. Currently the 413 494 delay target is set to 100 ms. <em>off_target</em> is how far the actual measured delay is from the 414 495 target delay (calculated from CCONTROL_TARGET - our_delay).</p> … … 418 499 <tt class="docutils literal"><span class="pre">max_window</span></tt>. Its size is controlled, roughly, by the following expression:</p> 419 500 <pre class="literal-block"> 420 scaled_gain = (off_target / CCONTROL_TARGET) 421 * (outstanding_packet * MAX_CWND_INCREASE_PACKETS_PER_RTT / max_window); 501 delay_factor = off_target / CCONTROL_TARGET; 502 window_factor = outstanding_packet / max_window; 503 scaled_gain = MAX_CWND_INCREASE_PACKETS_PER_RTT * delay_factor * window_factor; 422 504 </pre> 423 505 <p>Where the first factor scales the <em>off_target</em> to units of target delays.</p>