root / dotorg / v6 / html / beps / bep_0008.rst

Revision 10811, 19.7 kB (checked in by dave, 11 months ago)

slight text change.

Line 
1BEP: 8
2Title: Tracker Peer Obfuscation
3Version: $Revision$
4Last-Modified: $Date$
5Author:  David Harrison <dave@bittorrent.com>, Anthony Ciani <tony@ciani.phy.uic.edu>, Arvid Norberg <arvid@bittorrent.com>, Greg Hazel <greg@bittorrent.com>
6Status:  Draft
7Type:    Standards Track
8Created: 31-Jan-2008
9Post-History:
10
11This extends the tracker protocol to support simple obfuscation of the
12peers it returns, using the infohash as a shared secret between the
13peer and the tracker. The obfuscation does not provide any security
14against eavesdroppers that know the infohash of the torrent.  The goal
15is to prevent internet service providers and other network
16administrators from blocking or disrupting bittorrent traffic
17connections that span between the receiver of a tracker response and
18any peer IP-port appearing in that tracker response.
19
20The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
21"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are
22to be interpreted as described in IETF `RFC 2119`_.
23
24
25Announce Parameter
26==================
27
28When using this extension, instead of passing the ``info_hash`` parameter
29to the tracker, a ``sha_ih`` is passed.
30
31The value of ``sha_ih`` MUST be the info-hash of the torrent, with a second
32SHA-1 applied to it.
33
34For example if a torrent has infohash with hex representation
35``aaf4c61ddcc5e8a2dabedef3b482cd9aea9434d`` then its ``sha_ih`` is
36``sha1(infohash)='6b4f89a54e2d27ecd7e8da5b4ab8fd9d1d8b119'``.
37
38The value MUST be url encoded, just like the ``info_hash``.  Thus the
39``sha_ih`` above when url encoded becomes
40``kO%89%A5N-%27%EC%D7%E8%DA%05%B4%AB%8F%D9%D1%D8%B1%19``.
41
42This extension does not change the semantics of any parameter passed
43in the peer's announce.
44
45Announce Response
46=================
47
48If the tracker supports this extension, the response should be exactly
49the same as if the ``info_hash`` had been passed, except that any
50field that contains peer information (such as ``peers``, ``peers6`` or
51any other field defined by another extension) MUST be obfuscated as
52described in the next section.
53
54There are additional parameters the tracker may OPTIONALLY return.
55These are discussed in the optimizations_ section.
56
57Peer List Obfuscation
58=====================
59
60We distinguish between the *tracker peer list* and the *returned peer
61list*.  The *tracker peer list* contains the ip-port pairs of all
62known peers in a given torrent, i.e., those peers that have reported
63to the tracker that they are transferring the file with a given
64infohash.  The tracker may store this peer list however it wishes.
65The *returned peer list* contains a packed array of ip-port pairs
66conforming to the BitTorrent protocol specification.  If the swarm is
67sufficiently large then the returned ip-port pairs constitute a subset
68of the ip-port pairs in the *tracker peer list*.
69
70The returned peer list is encrypted using RC4-drop768 encryption using
71the infohash as a shared secret and optionally employing an
72initialization vector.
73
74For the remainder of this document RC4 refers to RC4-drop768.  In the
75process of encryption, RC4 generates a pseudorandom string that is
76XOR'd with the plaintext to generate the ciphertext.  The receiver
77recovers the plaintext by generating the same pseudorandom string and
78XOR'ing it with the ciphertext.  In generating the pseudorandom
79string, the tracker and client MUST discard the first 768 bytes.  The
80next 8 bytes in the pseudorandom string are reserved for optimizations
81discussed in the next section.
82
83To communicate an initialization vector, the tracker includes in the
84bencoded response the key ``iv`` with value set to a byte string
85containing the initialization vector.  The initialization vector can
86be of arbitrary length and is sent in plaintext.
87
88If the tracker sends no initialization vector then the infohash is
89used as the RC4 key (160 bit key).  If the tracker provides an
90initialization vector then the RC4 key is generated by appending the
91vector to the infohash and then hashing with SHA-1.  The resulting
92hash is then used as the RC4 key.  The string from which the RC4 key
93is derived whether it be the infohash or the SHA-1 of the
94initialization vector appended to the infohash is called the
95*intermediate string*.
96
97For example, given infohash ``aaf4c61ddcc5e8a2dabedef3b482cd9aea9434d``
98and initialization vector ``abcd`` both represented in hex, the RC4 key
99is derived as follows:
100
101::
102 
103   key = sha1( 'aaf4c61ddcc5e8a2dabedef3b482cd9aea9434dabcd' )
104
105where [i:j] denotes the *ith* through *jth* bit including the *ith*
106but excluding the *jth*.  The resulting key in hex is
107``f36e9cae87cf33e07645ef5ca745a8a83469f31e``.
108
109It is RECOMMENDED that the tracker use the initialization vector, and
110that it change the ``iv`` on roughly the same period as the rerequest
111interval.  The reasoning for this is contained in the rationale.
112
113
114Optimizations
115=============
116
117The described optimizations are OPTIONAL for the tracker, but the
118corresponding client-side MUST be implemented by clients that support
119this extension.  These optimizations hobble the strength of the RC4
120encryption in order to improve tracker performance.  In the rationale_
121section we discuss why hobbling RC4 is reasonable and in many cases
122has negligible foreseen effect on security.
123
124For the purpose of these optimizations we assume that the tracker
125stores the tracker peer list for each infohash as a packed array that
126can be copied directly into the response.  We further assume that the
127packed array is reused many times and that with each request the
128tracker either returns the entire packed array or copies a single
129contiguous substring from the tracker peer list into the response.
130
131If the peerlist is represented and used as assumed then to improve
132randomness in the set of peers handed out by the tracker, it is
133RECOMMENDED that the tracker periodically reshuffle the peerlist with
134period similar to the rerequest interval.  After each reshuffle the
135tracker reperforms the operations described in this section.
136
137To reduce computation the tracker MAY cache the pseudorandom string
138generated by RC4 and reuse it as peers arrive and depart.
139
140The tracker MAY also cache the encrypted tracker peer list.  To
141support this the tracker MUST pass two additional keys *i* and *n*
142each with 32-bit integer values, except the tracker MAY omit *i* and
143*n* when *i=0* and the *returned peer list* is the entire *tracker peer
144list*.  Whether the tracker returns *i* and *n*, the first 8 bytes of
145the RC4 psuedorandom string are reserved for obscuring *i* and *n*.
146We come back to this momentarily.  Decryption starts by XORing from
147*6i* bytes for ipv4 (or *18i* for ipv6) into the pseudorandom string
148after the discarded and reserved bytes.  Assuming that the tracker
149encrypted the tracker peer list starting from the first byte after the
150discarded and reserved bytes in the pseudorandom string then *i* also
151corresponds to the *ith* ip-port pair in the tracker peer list.
152
153So that the client and the tracker do not have to generate an
154arbitrarily long pseudorandom string to support large swarms, we
155assume the tracker bounds the length of the pseudorandom string and
156reports the length in ip-port pairs as the value to key *n**n*
157excludes reserved and discarded bytes.  We RECOMMEND that *n* be equal
158to the length of the tracker peer list or random but within constant
159factor of the longest peerlist returned by the tracker, whichever is
160smaller.  Thus the tracker encrypts the *jth* byte of the *ith*
161ip-port pair in an ipv4 tracker peer list by XORing with the byte
162*(6i+j)* `mod` *n* bytes into the pseudorandom string.
163
164Transmitting *i* and *n* as plaintext would significantly reduce the
165cost for an attacker to recover the pseudorandom string.  The tracker
166MUST XOR the value of *i* with the first 32 bits of the pseudorandom
167string.  The tracker then XORs *n* with the next 32 bits from the
168pseudorandom string (see Figure 1).
169
170.. figure:: bep_0008_pseudo.png
171
172   **Figure 1:** The first 768 bytes of the RC4 pseudorandom
173   string are discarded.  The key *i* in the tracker response has
174   value ``x xor i``.  The key *n* has value ``y xor n``.
175
176We describe encryption in the following example for an ipv4 tracker peer
177list consisting of 3 ip-port pairs, and using an RC4 pseudorandom string
178of length *n=2*. *n* is small for purposes of illustration.  Also, for the
179purpose of illustration, the tracker returns only 2 peers at a time.
180
181::
182
183  Given the following peer list
184  (208.72.193.86, 6881), (209.81.173.15,14321), (128.213.6.8, 6881)
185
186  As a packed array represented in hex it becomes
187               
188  d048c1561ae1d151ad0f37f180d506081ae1
189
190  which we XOR with an RC4 pseudorandom string excluding discarded and
191  reserved bytes, e.g.,
192
193  a496e5f9b83e835013d42226
194
195  to generate
196
197  74de24afa2df5201bedb15d72443e3f1a2df
198
199Because the RC4 pseudorandom string is shorter than the tracker
200peer list, we wrap to the beginning of the pseudorandom string.
201
202A tracker returning the first two peers would return the bencoded
203equivalent of::
204
205  peers=74de24afa2df5201bedb15d7, i=0, n=2
206
207A tracker returning the second and third peer would return the
208bencoded equivalent of::
209
210  peers=5201bedb15d72443e3f1a2df, i=1, n=2
211
212In each response the tracker includes additional parameters such as
213the rerequest ``interval`` and the initialization vector ``iv``.
214
215The tracker response MUST remain a valid bencoded message.
216
217
218Backwards Compatibility
219=======================
220
221Trackers that support obfuscation are identified in the .torrent file
222by the inclusion of an ``obfuscate-announce-list`` which otherwise has the
223same semantics as the ``announce-list`` key.  Peers that do not support
224obfuscation simply ignore the ``obfuscate-announce-list``
225
226A client that is configured to use this extension should always send
227the ``sha_ih`` to any tracker supporting obfuscation.  The client
228SHOULD only contact trackers in the ``announce-list`` once the client
229has attempted all trackers in the ``obfuscate-announce-list`` and all failed.
230
231If a tracker that supports obfuscation wishes to allow legacy peers to
232connect to the tracker then the announce URL should appear in both the
233``obfuscate-announce-list`` and the ``announce-list``.
234
235If a tracker URL appears in both lists running on the same port, and
236the tracker failed to respond when selected from the
237``obfuscate-announce-list`` then the client MAY treat the tracker in
238the ``announce-list`` as if it were temporarily unreachable and defer
239trying it until it has tried other trackers in the ``announce-list``.
240
241Peers MUST never send both the ``info_hash`` and ``sha_ih`` parameters
242in the same request, since that would defeat the purpose of the shared
243secret.
244
245Any peer that requests with a ``sha_ih`` SHOULD implement Message
246Stream Encryption (MSE) [#MSE]_.  Any peer returned from the tracker
247in response to a request with a ``sha_ih`` SHOULD be assumed to
248support Message Stream Encryption.  We include these provisions
249because if a peer communicates with another peer without using MSE
250then the BitTorrent protocol is trivially identified from the first
251twenty bytes of the BitTorrent header and the ``info_hash`` appears in
252plaintext as the next twenty bytes, hence also defeating the purpose
253of the shared secret.
254
255If the tracker does not know enough peers assumed to support MSE to
256return the desired number of peers then it MAY include peers that are
257not assumed to support MSE.  If a peer closes a connection in response
258to an encrypted header then the initiating peer SHOULD assume that the
259peer does not support MSE.  The initiating peer however SHOULD ONLY
260initiate unencrypted connections when all peers have been tried and
261those that support MSE fail to provide "adequate performance."  We
262intentionally omit any definition of "adequate performance."
263
264
265Rationale
266=========
267
268This extension directly addresses a known attack on the BitTorrent
269protocol performed by some deployed network hardware.  By obscuring
270the ip-port pairs network hardware can no longer easily identify
271ip-port pairs that are running BitTorrent by observing peer-to-tracker
272communications.  This deployed hardware under some conditions disrupts
273BitTorrent connections by injecting forged TCP reset packets.  Once a
274BitTorrent connection has been identified, other attacks could also be
275performed such as severely rate limiting or blocking these
276connections.
277
278This hardware was presumably deployed to get around BitTorrent
279Message Stream Encryption [#MSE]_.  Peers implementing BitTorrent Message Stream
280Encryption obfuscate peer-to-peer connections by employing RC4
281encryption on every byte from the first byte transferred. BitTorrent
282Message Stream Encryption thus increases the difficulty for a device
283observing passing packets to identify BitTorrent peer-to-peer
284connections.
285
286By using the SHA-1 of the infohash, the tracker is able to identify
287torrents without sending the plaintext infohash and without requiring
288an additional prior exchange of a shared secret.  Where trackers now
289maintain mappings from infohash to the corresponding torrent's
290peerlist and other torrent-specific state, obfuscated trackers would need
291one additional mapping from ``sha_ih`` to the torrent's state.
292Tracker may also store encrypted versions of each torrent's peer list,
293to increase computation performance at the expense of increasing
294memory footprint by a constant factor.
295
296The obfuscation method meets the following criteria:
297
298- The entire plaintext of the peer list is not easily obtained even if
299  an eavesdropper identifies ip-port pairs from subsequent connections
300  initiated by a peer that has received a tracker response.
301
302- Even when a subsequent connection from a peer that has received a
303  tracker response is observed by an eavesdropper, it is difficult to
304  map the ip-port pair to specific ciphertext to verify that the
305  connection is using BitTorrent.
306
307When the optimizations_ are used,
308 
309- Few computations are performed at request time.
310
311- Encryption may be performed at the time a peer is added.
312  The encrypted peer ip and port may be handed out hundreds of times.
313
314- Security is minimally impacted.
315
316The objective is NOT to create a cryptographically secure protocol
317that can survive unlimited observation of passing packets and
318substantial computational resources on network timescales.  The objective
319is to raise the bar sufficiently to deter attacks based on observing
320ip-port numbers in peer-to-tracker communications.
321
322If a tracker observes a large number of tracker requests and responses
323and subsequent connections, it is possible to attack the encryption.
324RC4 is known to have a number of weaknesses especially in the way it
325was used with WEP [#Borisov]_ [#Scott]_ [#Stubblefeld]_.  However,
326with tracker peer obfuscation, the number of bytes transferred between
327the tracker and a client is likely significantly smaller than transferred
328between a wireless computer and a basestation.  An attacker faces a
329much larger task in obtaining sufficient probable plaintext to
330directly break the encryption.
331
332Hobbling the RC4 encryption by using a bounded-length RC4 pseudorandom
333string for small swarms is likely to have negilgible impact on
334security over any other encyption method since the pseudorandom string
335is probably equal to or longer than the plaintext and thus no part of
336it is repeated in the XOR except as peers arrive or leave the swarm.
337Thus on the timescales of rerequest intervals, nearly the same
338ciphertext is handed to every peer requesting the same infohash.
339Intercepting the same ciphertext multiple times provides no additional
340information to the attacker.  The attacker could correlate ip-port
341pairs in connections following tracker responses, but an attacker
342could do this regardless of the encryption method employed.
343Furthermore more direct methods of traffic analysis applied to
344peer-to-peer communication is available to network operators.
345
346For larger swarms, hobbling RC4 may more significantly impact breaking
347the encryption since the same pseudorandom string is used repeatedly
348across the peer list.  Some study is in order on this point taking
349into account that the tracker can periodically change intiailization
350vectors.
351
352We know from experience that periodically reshuffling peer lists on
353the order of the rerequest interval negligibly impacts tracker
354performance even with swarms containing millions of peers.  Generating
355a new pseudorandom string using RC4 on this same time interval is
356likely to incur negligible performance penalty because 1) RC4 is a
357small constant factor more expensive than a shuffle on an input string
358of equal length, 2) the generated pseudorandom string is only *n*
359ip-port pairs long where recommended *n* is within a small constant
360factor larger than the largest *returned peer list* and thus much
361smaller than the *tracker peer list* for large swarms, and 3) the cost
362of the XOR operation is lighter weight than performing a random
363shuffle.
364
365
366References
367==========
368
369.. _`RFC 2119`: http://tools.ietf.org/html/rfc2119
370
371.. [#MSE] BitTorrent Message Stream Encryption
372   (http://www.azureuswiki.com/index.php/Message_Stream_Encryption)
373
374.. [#Borisov] Nikita Borisov, Ian Goldberg, and David Wagner. Intercepting
375   mobile communications: the insecurity of 802.11. In ACM MobiCom 2001,
376   pages 180-189. ACM Press, 2001.
377
378.. [#Scott] Scott R. Fluhrer, Itsik Mantin, and Adi
379   Shamir. Weaknesses in the key scheduling algorithm of RC4. In Serge
380   Vaudenay and Amr M. Youssef, editors, Selected Areas in
381   Cryptography 2001, volume 2259 of Lecture Notes in Computer
382   Science, pages 1-24. Springer, 2001.
383
384.. [#Stubblefeld] Adam Stubblefeld, John Ioannidis, and Aviel
385   D. Rubin. A key recovery attack on the 802.11b wired equivalent
386   privacy protocol (WEP). ACM Transactions on Information and System
387   Security, 7(2):319-332, May 2004.
388
389
390Example Python Code
391===================
392
393Request handling in a dummy tracker implementing tracker peer obfuscation::
394
395  from sha import sha
396  from random import randint
397  from struct import unpack
398  from rc4 import rc4  # rc4(k) generates k RC4 pseudorandom bytes.
399 
400  rand = open("/dev/random","r").read
401  rc4 = rc4()
402 
403  # tracker configuration
404  MAX_PEERS = 100
405 
406  # per torrent state.
407  infohash = sha("dummy_info").digest()
408  pseudo = ''                        # pseudorandom RC4 string.
409  num_peers = 1000                   # current swarm size.
410  tracker_peer_list = rand(6) * num_peers
411  obfuscated_tracker_peer_list = ''
412 
413  def xor(plaintext,pseudo):
414    isint = False
415    if type(plaintext) == int: # convert to byte string.
416      plaintext = "".join([chr(int(x,16)) for x in "%.4x" % plaintext])
417      isint = True
418    n = len(pseudo)
419    ciphertext = "".join(
420      [chr(ord(pseudo[i%n])^ord(plaintext[i])) for i in xrange(len(plaintext))])
421    if isint:
422      ciphertext = unpack("!I", ciphertext)[0]   # convert back to unsigned int
423    return ciphertext
424 
425  def init():  # called once per rerequest interval.
426    global iv, x, n, n_xor_y, obfuscated_tracker_peer_list
427    iv = rand(20)
428    rc4.key = sha(infohash + iv).digest()
429    rc4(768)                         # discard first 768
430    x = rc4(4)
431    y = rc4(4)
432    n = min(num_peers, randint(MAX_PEERS * 2, MAX_PEERS * 4))
433    n_xor_y = xor(n,y)
434    pseudo = rc4(n*6)
435    obfuscated_tracker_peer_list = xor(tracker_peer_list,pseudo)
436 
437  def getpeers( numwant ):
438    global iv, x, n, n_xor_y, obfuscated_tracker_peer_list
439    response = {}
440    response['iv'] = iv
441    numwant = min(numwant, MAX_PEERS)
442    if numwant > num_peers:
443      response['peers'] = obfuscated_tracker_peer_list
444      return response
445    i = randint(0,num_peers)
446    response['i'] = xor(i,x)
447    response['n'] = n_xor_y
448    response['peers'] = obfuscated_tracker_peer_list[i*6:(i+numwant)*6]
449    if len(response['peers']) < numwant * 6:
450      r = numwant - len(response['peers']) / 6
451      response['peers'] = response['peers'] + obfuscated_tracker_peer_list[:r]
452    return response
453 
454  init()
455  print getpeers(20)
456
457
458..
459   Local Variables:
460   mode: indented-text
461   indent-tabs-mode: nil
462   sentence-end-double-space: t
463   fill-column: 70
464   coding: utf-8
465   End:
Note: See TracBrowser for help on using the browser.