root/dotorg/trunk/html/beps/bep_0003.rst @ 11162

Revision 11162, 15.1 KB (checked in by arvid, 5 months ago)

clarified encoding of strings in .torrent files

Line 
1BEP: 3
2Title: The BitTorrent Protocol Specification
3Version: $Revision$
4Last-Modified: $Date$
5Author:  Bram Cohen <bram@bittorrent.com>
6Status:  Final
7Type:    Standard
8Created: 10-Jan-2008
9Post-History: 24-Jun-2009, clarified the encoding of strings in torrent files
10
11BitTorrent is a protocol for distributing files. It identifies content
12by URL and is designed to integrate seamlessly with the web. Its
13advantage over plain HTTP is that when multiple downloads of the same
14file happen concurrently, the downloaders upload to each other, making
15it possible for the file source to support very large numbers of
16downloaders with only a modest increase in its load.
17
18A BitTorrent file distribution consists of these entities:
19----------------------------------------------------------
20
21- An ordinary web server
22- A static 'metainfo' file
23- A BitTorrent tracker
24- An 'original' downloader
25- The end user web browsers
26- The end user downloaders
27
28There are ideally many end users for a single file.
29
30To start serving, a host goes through the following steps:
31----------------------------------------------------------
32
33#. Start running a tracker (or, more likely, have one running already).
34#. Start running an ordinary web server, such as apache, or have one already.
35#. Associate the extension .torrent with mimetype application/x-bittorrent on their web server (or have done so already).
36#. Generate a metainfo (.torrent) file using the complete file to be served and the URL of the tracker.
37#. Put the metainfo file on the web server.
38#. Link to the metainfo (.torrent) file from some other web page.
39#. Start a downloader which already has the complete file (the 'origin').
40
41To start downloading, a user does the following:
42------------------------------------------------
43
44#. Install BitTorrent (or have done so already).
45#. Surf the web.
46#. Click on a link to a .torrent file.
47#. Select where to save the file locally, or select a partial download to resume.
48#. Wait for download to complete.
49#. Tell downloader to exit (it keeps uploading until this happens).
50
51The connectivity is as follows:
52-------------------------------
53
54- Strings are length-prefixed base ten followed by a colon and the string. For example ``4:spam`` corresponds to 'spam'.
55
56- Integers are represented by an 'i' followed by the number in base 10
57  followed by an 'e'. For example ``i3e`` corresponds to 3 and
58  ``i-3e`` corresponds to -3. Integers have no size
59  limitation. ``i-0e`` is invalid. All encodings with a leading
60  zero, such as ``i03e``, are invalid, other than
61  ``i0e``, which of course corresponds to 0.
62
63- Lists are encoded as an 'l' followed by their elements (also
64  bencoded) followed by an 'e'. For example ``l4:spam4:eggse``
65  corresponds to ['spam', 'eggs'].
66
67- Dictionaries are encoded as a 'd' followed by a list of alternating
68  keys and their corresponding values followed by an 'e'. For example,
69  ``d3:cow3:moo4:spam4:eggse`` corresponds to {'cow': 'moo',
70  'spam': 'eggs'} and ``d4:spaml1:a1:bee`` corresponds to
71  {'spam': ['a', 'b']}. Keys must be strings and appear in sorted order
72  (sorted as raw strings, not alphanumerics).
73
74
75Metainfo files are bencoded dictionaries with the following keys:
76-----------------------------------------------------------------
77
78announce
79  The URL of the tracker.
80
81info
82  This maps to a dictionary, with keys described below.
83
84  The ``name`` key maps to a UTF-8 encoded string which is the
85  suggested name to save the file (or directory) as. It is purely advisory.
86
87  ``piece length`` maps to the number of bytes in each piece
88  the file is split into. For the purposes of transfer, files are
89  split into fixed-size pieces which are all the same length except for
90  possibly the last one which may be truncated. ``piece
91  length`` is almost always a power of two, most commonly 2 18 =
92  256 K (BitTorrent prior to version 3.2 uses 2 20 = 1 M as
93  default).
94
95  ``pieces`` maps to a string whose length is a multiple of
96  20. It is to be subdivided into strings of length 20, each of which is
97  the SHA1 hash of the piece at the corresponding index.
98
99  There is also a key ``length`` or a key ``files``,
100  but not both or neither. If ``length`` is present then the
101  download represents a single file, otherwise it represents a set of
102  files which go in a directory structure.
103
104  In the single file case, ``length`` maps to the length of
105  the file in bytes.
106
107  For the purposes of the other keys, the multi-file case is treated as
108  only having a single file by concatenating the files in the order they
109  appear in the files list. The files list is the value
110  ``files`` maps to, and is a list of dictionaries containing
111  the following keys:
112 
113  ``length`` - The length of the file, in bytes.
114
115  ``path`` - A list of UTF-8 encoded strings corresponding to subdirectory
116  names, the last of which is the actual file name (a zero length list
117  is an error case).
118
119  In the single file case, the name key is the name of a file, in the
120  muliple file case, it's the name of a directory.
121
122  All strings in a .torrent file that contains text must be UTF-8
123  encoded.
124
125
126Tracker GET requests have the following keys:
127---------------------------------------------
128
129info_hash
130  The 20 byte sha1 hash of the bencoded form of the info value from the
131  metainfo file. Note that this is a substring of the metainfo
132  file. This value will almost certainly have to be escaped.
133
134peer_id
135  A string of length 20 which this downloader uses as its id. Each
136  downloader generates its own id at random at the start of a new
137  download. This value will also almost certainly have to be escaped.
138
139ip
140  An optional parameter giving the IP (or dns name) which this peer is
141  at. Generally used for the origin if it's on the same machine as the
142  tracker.
143
144port
145  The port number this peer is listening on. Common behavior is for a
146  downloader to try to listen on port 6881 and if that port is taken try
147  6882, then 6883, etc. and give up after 6889.
148
149uploaded
150  The total amount uploaded so far, encoded in base ten ascii.
151
152downloaded
153  The total amount downloaded so far, encoded in base ten ascii.
154
155left
156  The number of bytes this peer still has to download, encoded in
157  base ten ascii. Note that this can't be computed from downloaded and
158  the file length since it might be a resume, and there's a chance that
159  some of the downloaded data failed an integrity check and had to be
160  re-downloaded.
161
162event
163  This is an optional key which maps to ``started``,
164  ``completed``, or ``stopped`` (or
165  ``empty``, which is the same as not being present). If not
166  present, this is one of the announcements done at regular
167  intervals. An announcement using ``started`` is sent when a
168  download first begins, and one using ``completed`` is sent
169  when the download is complete. No ``completed`` is sent if
170  the file was complete when started. Downloaders send an announcement
171  using ``stopped`` when they cease downloading.
172
173Tracker responses are bencoded dictionaries. If a tracker response
174has a key ``failure reason``, then that maps to a human
175readable string which explains why the query failed, and no other keys
176are required. Otherwise, it must have two keys: ``interval``,
177which maps to the number of seconds the downloader should wait between
178regular rerequests, and ``peers``. ``peers`` maps to
179a list of dictionaries corresponding to ``peers``, each of
180which contains the keys ``peer id``, ``ip``, and
181``port``, which map to the peer's self-selected ID, IP
182address or dns name as a string, and port number, respectively. Note
183that downloaders may rerequest on nonscheduled times if an event
184happens or they need more peers.
185
186If you want to make any extensions to metainfo files or tracker
187queries, please coordinate with Bram Cohen to make sure that all
188extensions are done compatibly.
189
190BitTorrent's peer protocol operates over TCP. It performs efficiently
191without setting any socket options.
192
193Peer connections are symmetrical. Messages sent in both directions
194look the same, and data can flow in either direction.
195
196The peer protocol refers to pieces of the file by index as
197described in the metainfo file, starting at zero. When a peer finishes
198downloading a piece and checks that the hash matches, it announces
199that it has that piece to all of its peers.
200
201Connections contain two bits of state on either end: choked or not,
202and interested or not. Choking is a notification that no data will be
203sent until unchoking happens. The reasoning and common techniques
204behind choking are explained later in this document.
205
206Data transfer takes place whenever one side is interested and the
207other side is not choking. Interest state must be kept up to date at
208all times - whenever a downloader doesn't have something they
209currently would ask a peer for in unchoked, they must express lack of
210interest, despite being choked. Implementing this properly is tricky,
211but makes it possible for downloaders to know which peers will start
212downloading immediately if unchoked.
213
214Connections start out choked and not interested.
215
216When data is being transferred, downloaders should keep several
217piece requests queued up at once in order to get good TCP performance
218(this is called 'pipelining'.) On the other side, requests which can't
219be written out to the TCP buffer immediately should be queued up in
220memory rather than kept in an application-level network buffer, so
221they can all be thrown out when a choke happens.
222
223The peer wire protocol consists of a handshake followed by a
224never-ending stream of length-prefixed messages. The handshake starts
225with character ninteen (decimal) followed by the string 'BitTorrent
226protocol'. The leading character is a length prefix, put there in the
227hope that other new protocols may do the same and thus be trivially
228distinguishable from each other.
229
230All later integers sent in the protocol are encoded as four bytes
231big-endian.
232
233After the fixed headers come eight reserved bytes, which are all
234zero in all current implementations. If you wish to extend the
235protocol using these bytes, please coordinate with Bram Cohen to make
236sure all extensions are done compatibly.
237
238Next comes the 20 byte sha1 hash of the bencoded form of the info
239value from the metainfo file. (This is the same value which is
240announced as ``info_hash`` to the tracker, only here it's raw
241instead of quoted here). If both sides don't send the same value, they
242sever the connection. The one possible exception is if a downloader
243wants to do multiple downloads over a single port, they may wait for
244incoming connections to give a download hash first, and respond with
245the same one if it's in their list.
246
247After the download hash comes the 20-byte peer id which is reported
248in tracker requests and contained in peer lists in tracker
249responses. If the receiving side's peer id doesn't match the one the
250initiating side expects, it severs the connection.
251
252That's it for handshaking, next comes an alternating stream of
253length prefixes and messages. Messages of length zero are keepalives,
254and ignored. Keepalives are generally sent once every two minutes, but
255note that timeouts can be done much more quickly when data is
256expected.
257
258All non-keepalive messages start with a single byte which gives their type.
259---------------------------------------------------------------------------
260The possible values are:
261------------------------
262
263- 0 - choke
264- 1 - unchoke
265- 2 - interested
266- 3 - not interested
267- 4 - have
268- 5 - bitfield
269- 6 - request
270- 7 - piece
271- 8 - cancel
272
273'choke', 'unchoke', 'interested', and 'not interested' have no payload.
274
275'bitfield' is only ever sent as the first message. Its payload is a
276bitfield with each index that downloader has sent set to one and the
277rest set to zero. Downloaders which don't have anything yet may skip
278the 'bitfield' message. The first byte of the bitfield corresponds to
279indices 0 - 7 from high bit to low bit, respectively. The next one
2808-15, etc. Spare bits at the end are set to zero.
281
282The 'have' message's payload is a single number, the index which
283that downloader just completed and checked the hash of.
284
285'request' messages contain an index, begin, and length. The last
286two are byte offsets. Length is generally a power of two unless it
287gets truncated by the end of the file. All current implementations use
2882 15 , and close connections which request an amount greater than 2
28917.
290
291'cancel' messages have the same payload as request messages. They
292are generally only sent towards the end of a download, during what's
293called 'endgame mode'. When a download is almost complete, there's a
294tendency for the last few pieces to all be downloaded off a single
295hosed modem line, taking a very long time. To make sure the last few
296pieces come in quickly, once requests for all pieces a given
297downloader doesn't have yet are currently pending, it sends requests
298for everything to everyone it's downloading from. To keep this from
299becoming horribly inefficient, it sends cancels to everyone else every
300time a piece arrives.
301
302'piece' messages contain an index, begin, and piece. Note that they
303are correlated with request messages implicitly. It's possible for an
304unexpected piece to arrive if choke and unchoke messages are sent in
305quick succession and/or transfer is going very slowly.
306
307Downloaders generally download pieces in random order, which does a
308reasonably good job of keeping them from having a strict subset or
309superset of the pieces of any of their peers.
310
311Choking is done for several reasons. TCP congestion control behaves
312very poorly when sending over many connections at once. Also, choking
313lets each peer use a tit-for-tat-ish algorithm to ensure that they get
314a consistent download rate.
315
316The choking algorithm described below is the currently deployed
317one. It is very important that all new algorithms work well both in a
318network consisting entirely of themselves and in a network consisting
319mostly of this one.
320
321There are several criteria a good choking algorithm should meet. It
322should cap the number of simultaneous uploads for good TCP
323performance. It should avoid choking and unchoking quickly, known as
324'fibrillation'. It should reciprocate to peers who let it
325download. Finally, it should try out unused connections once in a
326while to find out if they might be better than the currently used
327ones, known as optimistic unchoking.
328
329The currently deployed choking algorithm avoids fibrillation by
330only changing who's choked once every ten seconds. It does
331reciprocation and number of uploads capping by unchoking the four
332peers which it has the best download rates from and are
333interested. Peers which have a better upload rate but aren't
334interested get unchoked and if they become interested the worst
335uploader gets choked. If a downloader has a complete file, it uses its
336upload rate rather than its download rate to decide who to
337unchoke.
338
339
340For optimistic unchoking, at any one time there is a single peer
341which is unchoked regardless of it's upload rate (if interested, it
342counts as one of the four allowed downloaders.) Which peer is
343optimistically unchoked rotates every 30 seconds. To give them a
344decent chance of getting a complete piece to upload, new connections
345are three times as likely to start as the current optimistic unchoke
346as anywhere else in the rotation.
347
348Copyright
349---------
350
351This document has been placed in the public domain.
352
353
354..
355   Local Variables:
356   mode: indented-text
357   indent-tabs-mode: nil
358   sentence-end-double-space: t
359   fill-column: 70
360   coding: utf-8
361   End:
362
Note: See TracBrowser for help on using the browser.