root / dotorg / v7 / html / beps / bep_0008.rst

Revision 10836, 19.3 kB (checked in by dave, 9 months ago)

remove text to improve readability

Line 
1BEP: 8
2Title: Tracker Peer Obfuscation
3Version: $Revision$
4Last-Modified: $Date$
5Author:  David Harrison <dave@bittorrent.com>, Anthony Ciani <tony@ciani.phy.uic.edu>, Arvid Norberg <arvid@bittorrent.com>, Greg Hazel <greg@bittorrent.com>
6Status:  Draft
7Type:    Standards Track
8Created: 31-Jan-2008
9Post-History:
10
11This extends the tracker protocol to support simple obfuscation of the
12peers it returns, using the infohash as a shared secret between the
13peer and the tracker. The obfuscation does not provide any security
14against eavesdroppers that know the infohash of the torrent.  The goal
15is to prevent internet service providers and other network
16administrators from blocking or disrupting bittorrent traffic
17connections that span between the receiver of a tracker response and
18any peer IP-port appearing in that tracker response.
19
20The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
21"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are
22to be interpreted as described in IETF `RFC 2119`_.
23
24
25Announce Parameter
26==================
27
28When using this extension, instead of passing the ``info_hash`` parameter
29to the tracker, a ``sha_ih`` is passed.
30
31The value of ``sha_ih`` MUST be the info-hash of the torrent, with a second
32SHA-1 applied to it.
33
34For example if a torrent has infohash with hex representation
35``aaf4c61ddcc5e8a2dabedef3b482cd9aea9434d`` then its ``sha_ih`` is
36``sha1(infohash)='6b4f89a54e2d27ecd7e8da5b4ab8fd9d1d8b119'``.
37
38The value MUST be url encoded, just like the ``info_hash``.  Thus the
39``sha_ih`` above when url encoded becomes
40``kO%89%A5N-%27%EC%D7%E8%DA%05%B4%AB%8F%D9%D1%D8%B1%19``.
41
42This extension does not change the semantics of any parameter passed
43in the peer's announce.
44
45Announce Response
46=================
47
48If the tracker supports this extension, the response should be exactly
49the same as if the ``info_hash`` had been passed, except that any
50field that contains peer information (such as ``peers``, ``peers6`` or
51any other field defined by another extension) MUST be obfuscated as
52described in the next section.
53
54There are additional parameters the tracker may OPTIONALLY return.
55These are discussed in the optimizations_ section.
56
57Peer List Obfuscation
58=====================
59
60We distinguish between the *tracker peer list* and the *returned peer
61list*.  The *tracker peer list* contains the ip-port pairs of all
62known peers in a given torrent, i.e., those peers that have reported
63to the tracker that they are transferring the file with a given
64infohash.  The tracker may store this peer list however it wishes.
65The *returned peer list* contains a packed array of ip-port pairs
66conforming to the BitTorrent protocol specification.  If the swarm is
67sufficiently large then the returned ip-port pairs constitute a subset
68of the ip-port pairs in the *tracker peer list*.
69
70The returned peer list is encrypted using RC4-drop768 encryption using
71the infohash as a shared secret and optionally employing an
72initialization vector.
73
74For the remainder of this document RC4 refers to RC4-drop768.  In the
75process of encryption, RC4 generates a pseudorandom string that is
76XOR'd with the plaintext to generate the ciphertext.  The receiver
77recovers the plaintext by generating the same pseudorandom string and
78XOR'ing it with the ciphertext.  In generating the pseudorandom
79string, the tracker and client MUST discard the first 768 bytes.  The
80next 8 bytes in the pseudorandom string are reserved for optimizations
81discussed in the next section.
82
83To communicate an initialization vector, the tracker includes in the
84bencoded response the key ``iv`` with value set to a byte string
85containing the initialization vector.  The initialization vector can
86be of arbitrary length and is sent in plaintext.
87
88If the tracker sends no initialization vector then the infohash is
89used as the RC4 key (160 bit key).  If the tracker provides an
90initialization vector then the RC4 key is generated by appending the
91vector to the infohash and then hashing with SHA-1.  The resulting
92hash is then used as the RC4 key. 
93
94For example, given infohash ``aaf4c61ddcc5e8a2dabedef3b482cd9aea9434d``
95and initialization vector ``abcd`` both represented in hex, the RC4 key
96is derived as follows:
97
98::
99 
100   key = sha1( 'aaf4c61ddcc5e8a2dabedef3b482cd9aea9434dabcd' )
101
102The resulting key in hex is ``f36e9cae87cf33e07645ef5ca745a8a83469f31e``.
103
104It is RECOMMENDED that the tracker use the initialization vector, and
105that it change the ``iv`` on roughly the same period as the rerequest
106interval.  The reasoning for this is contained in the rationale.
107
108
109Optimizations
110=============
111
112The described optimizations are OPTIONAL for the tracker, but the
113corresponding client-side MUST be implemented by clients that support
114this extension.  These optimizations hobble the strength of the RC4
115encryption in order to improve tracker performance.  In the rationale_
116section we discuss why hobbling RC4 is reasonable and in many cases
117has negligible foreseen effect on security.
118
119For the purpose of these optimizations we assume that the tracker
120stores the tracker peer list for each infohash as a packed array that
121can be copied directly into the response.  We further assume that the
122packed array is reused many times and that with each request the
123tracker either returns the entire packed array or copies a single
124contiguous substring from the tracker peer list into the response.
125
126If the peerlist is represented and used as assumed then to improve
127randomness in the set of peers handed out by the tracker, it is
128RECOMMENDED that the tracker periodically reshuffle the peerlist with
129period similar to the rerequest interval.  After each reshuffle the
130tracker reperforms the operations described in this section.
131
132To reduce computation the tracker MAY cache the pseudorandom string
133generated by RC4 and reuse it as peers arrive and depart.
134
135The tracker MAY also cache the encrypted tracker peer list.  To
136support this the tracker MUST pass two additional keys *i* and *n*
137each with 32-bit integer values, except the tracker MAY omit *i* and
138*n* when *i=0* and the *returned peer list* is the entire *tracker peer
139list*.  Whether the tracker returns *i* and *n*, the first 8 bytes of
140the RC4 psuedorandom string are reserved for obscuring *i* and *n*.
141We come back to this momentarily.  Decryption starts by XORing from
142*6i* bytes for ipv4 (or *18i* for ipv6) into the pseudorandom string
143after the discarded and reserved bytes.  Assuming that the tracker
144encrypted the tracker peer list starting from the first byte after the
145discarded and reserved bytes in the pseudorandom string then *i* also
146corresponds to the *ith* ip-port pair in the tracker peer list.
147
148So that the client and the tracker do not have to generate an
149arbitrarily long pseudorandom string to support large swarms, we
150assume the tracker bounds the length of the pseudorandom string and
151reports the length in ip-port pairs as the value to key *n**n*
152excludes reserved and discarded bytes.  We RECOMMEND that *n* be equal
153to the length of the tracker peer list or random but within constant
154factor of the longest peerlist returned by the tracker, whichever is
155smaller.  Thus the tracker encrypts the *jth* byte of the *ith*
156ip-port pair in an ipv4 tracker peer list by XORing with the byte
157*(6i+j)* `mod` *n* bytes into the pseudorandom string.
158
159Transmitting *i* and *n* as plaintext would significantly reduce the
160cost for an attacker to recover the pseudorandom string.  The tracker
161MUST XOR the value of *i* with the first 32 bits of the pseudorandom
162string.  The tracker then XORs *n* with the next 32 bits from the
163pseudorandom string (see Figure 1).
164
165.. figure:: bep_0008_pseudo.png
166
167   **Figure 1:** The first 768 bytes of the RC4 pseudorandom
168   string are discarded.  The key *i* in the tracker response has
169   value ``x xor i``.  The key *n* has value ``y xor n``.
170
171We describe encryption in the following example for an ipv4 tracker peer
172list consisting of 3 ip-port pairs, and using an RC4 pseudorandom string
173of length *n=2*. *n* is small for purposes of illustration.  Also, for the
174purpose of illustration, the tracker returns only 2 peers at a time.
175
176::
177
178  Given the following peer list
179  (208.72.193.86, 6881), (209.81.173.15,14321), (128.213.6.8, 6881)
180
181  As a packed array represented in hex it becomes
182               
183  d048c1561ae1d151ad0f37f180d506081ae1
184
185  which we XOR with an RC4 pseudorandom string excluding discarded and
186  reserved bytes, e.g.,
187
188  a496e5f9b83e835013d42226
189
190  to generate
191
192  74de24afa2df5201bedb15d72443e3f1a2df
193
194Because the RC4 pseudorandom string is shorter than the tracker
195peer list, we wrap to the beginning of the pseudorandom string.
196
197A tracker returning the first two peers would return the bencoded
198equivalent of::
199
200  peers=74de24afa2df5201bedb15d7, i=0, n=2
201
202A tracker returning the second and third peer would return the
203bencoded equivalent of::
204
205  peers=5201bedb15d72443e3f1a2df, i=1, n=2
206
207In each response the tracker includes additional parameters such as
208the rerequest ``interval`` and the initialization vector ``iv``.
209
210The tracker response MUST remain a valid bencoded message.
211
212
213Backwards Compatibility
214=======================
215
216Trackers that support obfuscation are identified in the .torrent file
217by the inclusion of an ``obfuscate-announce-list`` which otherwise has the
218same semantics as the ``announce-list`` key.  Peers that do not support
219obfuscation simply ignore the ``obfuscate-announce-list``
220
221A client that is configured to use this extension should always send
222the ``sha_ih`` to any tracker supporting obfuscation.  The client
223SHOULD only contact trackers in the ``announce-list`` once the client
224has attempted all trackers in the ``obfuscate-announce-list`` and all failed.
225
226If a tracker that supports obfuscation wishes to allow legacy peers to
227connect to the tracker then the announce URL should appear in both the
228``obfuscate-announce-list`` and the ``announce-list``.
229
230If a tracker URL appears in both lists running on the same port, and
231the tracker failed to respond when selected from the
232``obfuscate-announce-list`` then the client MAY treat the tracker in
233the ``announce-list`` as if it were temporarily unreachable and defer
234trying it until it has tried other trackers in the ``announce-list``.
235
236Peers MUST never send both the ``info_hash`` and ``sha_ih`` parameters
237in the same request, since that would defeat the purpose of the shared
238secret.
239
240Any peer that requests with a ``sha_ih`` SHOULD implement Message
241Stream Encryption (MSE) [#MSE]_.  Any peer returned from the tracker
242in response to a request with a ``sha_ih`` SHOULD be assumed to
243support Message Stream Encryption.  We include these provisions
244because if a peer communicates with another peer without using MSE
245then the BitTorrent protocol is trivially identified from the first
246twenty bytes of the BitTorrent header and the ``info_hash`` appears in
247plaintext as the next twenty bytes, hence also defeating the purpose
248of the shared secret.
249
250If the tracker does not know enough peers assumed to support MSE to
251return the desired number of peers then it MAY include peers that are
252not assumed to support MSE.  If a peer closes a connection in response
253to an encrypted header then the initiating peer SHOULD assume that the
254peer does not support MSE.  The initiating peer however SHOULD ONLY
255initiate unencrypted connections when all peers have been tried and
256those that support MSE fail to provide "adequate performance."  We
257intentionally omit any definition of "adequate performance."
258
259
260Rationale
261=========
262
263This extension directly addresses a known attack on the BitTorrent
264protocol performed by some deployed network hardware.  By obscuring
265the ip-port pairs network hardware can no longer easily identify
266ip-port pairs that are running BitTorrent by observing peer-to-tracker
267communications.  This deployed hardware under some conditions disrupts
268BitTorrent connections by injecting forged TCP reset packets. 
269
270This hardware was presumably deployed to get around BitTorrent
271Message Stream Encryption [#MSE]_.  Peers implementing BitTorrent Message Stream
272Encryption obfuscate peer-to-peer connections by employing RC4
273encryption on every byte from the first byte transferred. BitTorrent
274Message Stream Encryption thus increases the difficulty for a device
275observing passing packets to identify BitTorrent peer-to-peer
276connections.
277
278By using the SHA-1 of the infohash, the tracker is able to identify
279torrents without sending the plaintext infohash and without requiring
280an additional prior exchange of a shared secret.  Where trackers now
281maintain mappings from infohash to the corresponding torrent's
282peerlist and other torrent-specific state, obfuscated trackers would
283need one additional mapping from ``sha_ih`` to the torrent's state.
284Trackers may also cache the encrypted version of each torrent's
285tracker peer list, to increase computational performance at the
286expense of increasing memory footprint by a constant factor.
287
288The obfuscation method meets the following criteria:
289
290- The entire plaintext of the peer list is not easily obtained even if
291  an eavesdropper identifies ip-port pairs from subsequent connections
292  initiated by a peer that has received a tracker response.
293
294- Even when a subsequent connection from a peer that has received a
295  tracker response is observed by an eavesdropper, it is difficult to
296  map the ip-port pair to specific ciphertext to verify that the
297  connection is using BitTorrent.
298
299When the optimizations_ are used,
300 
301- Few computations are performed at request time.
302
303- Encryption may be performed at the time a peer is added.
304  The encrypted peer ip and port may be handed out hundreds of times.
305
306- Security is minimally impacted.
307
308The objective is NOT to create a cryptographically secure protocol
309that can survive unlimited observation of passing packets and
310substantial computational resources on network timescales.  The objective
311is to raise the bar sufficiently to deter attacks based on observing
312ip-port numbers in peer-to-tracker communications.
313
314If a tracker observes a large number of tracker requests and responses
315and subsequent connections, it is possible to attack the encryption.
316RC4 is known to have a number of weaknesses especially in the way it
317was used with WEP [#Borisov]_ [#Scott]_ [#Stubblefeld]_.  However,
318with tracker peer obfuscation, the number of bytes transferred between
319the tracker and a client is likely significantly smaller than transferred
320between a wireless computer and a basestation.  An attacker faces a
321much larger task in obtaining sufficient probable plaintext to
322directly break the encryption.
323
324Hobbling the RC4 encryption by using a bounded-length RC4 pseudorandom
325string for small swarms is likely to have negilgible impact on
326security over any other encyption method since the pseudorandom string
327is probably equal to or longer than the plaintext and thus no part of
328it is repeated in the XOR except as peers arrive or leave the swarm.
329Thus on the timescales of rerequest intervals, nearly the same
330ciphertext is handed to every peer requesting the same infohash.
331Intercepting the same ciphertext multiple times provides no additional
332information to the attacker.  The attacker could correlate ip-port
333pairs in connections following tracker responses, but an attacker
334could do this regardless of the encryption method employed.
335Furthermore more direct methods of traffic analysis applied to
336peer-to-peer communication is available to network operators.
337
338For larger swarms, hobbling RC4 may more significantly impact breaking
339the encryption since the same pseudorandom string is used repeatedly
340across the peer list.  Some study is in order on this point taking
341into account that the tracker can periodically change intiailization
342vectors.
343
344We know from experience that periodically reshuffling peer lists on
345the order of the rerequest interval negligibly impacts tracker
346performance even with swarms containing millions of peers.  Generating
347a new pseudorandom string using RC4 on this same time interval is
348likely to incur negligible performance penalty because 1) RC4 is a
349small constant factor more expensive than a shuffle on an input string
350of equal length, 2) the generated pseudorandom string is only *n*
351ip-port pairs long where recommended *n* is within a small constant
352factor larger than the largest *returned peer list* and thus much
353smaller than the *tracker peer list* for large swarms, and 3) the cost
354of the XOR operation is lighter weight than performing a random
355shuffle.
356
357
358References
359==========
360
361.. _`RFC 2119`: http://tools.ietf.org/html/rfc2119
362
363.. [#MSE] BitTorrent Message Stream Encryption
364   (http://www.azureuswiki.com/index.php/Message_Stream_Encryption)
365
366.. [#Borisov] Nikita Borisov, Ian Goldberg, and David Wagner. Intercepting
367   mobile communications: the insecurity of 802.11. In ACM MobiCom 2001,
368   pages 180-189. ACM Press, 2001.
369
370.. [#Scott] Scott R. Fluhrer, Itsik Mantin, and Adi
371   Shamir. Weaknesses in the key scheduling algorithm of RC4. In Serge
372   Vaudenay and Amr M. Youssef, editors, Selected Areas in
373   Cryptography 2001, volume 2259 of Lecture Notes in Computer
374   Science, pages 1-24. Springer, 2001.
375
376.. [#Stubblefeld] Adam Stubblefeld, John Ioannidis, and Aviel
377   D. Rubin. A key recovery attack on the 802.11b wired equivalent
378   privacy protocol (WEP). ACM Transactions on Information and System
379   Security, 7(2):319-332, May 2004.
380
381
382Example Python Code
383===================
384
385Request handling in a dummy tracker implementing tracker peer obfuscation::
386
387  from sha import sha
388  from random import randint
389  from struct import unpack
390  from rc4 import rc4  # rc4(k) generates k RC4 pseudorandom bytes.
391 
392  rand = open("/dev/random","r").read
393  rc4 = rc4()
394 
395  # tracker configuration
396  MAX_PEERS = 100
397 
398  # per torrent state.
399  infohash = sha("dummy_info").digest()
400  pseudo = ''                        # pseudorandom RC4 string.
401  num_peers = 1000                   # current swarm size.
402  tracker_peer_list = rand(6) * num_peers
403  obfuscated_tracker_peer_list = ''
404 
405  def xor(plaintext,pseudo):
406    isint = False
407    if type(plaintext) == int: # convert to byte string.
408      plaintext = "".join([chr(int(x,16)) for x in "%.4x" % plaintext])
409      isint = True
410    n = len(pseudo)
411    ciphertext = "".join(
412      [chr(ord(pseudo[i%n])^ord(plaintext[i])) for i in xrange(len(plaintext))])
413    if isint:
414      ciphertext = unpack("!I", ciphertext)[0]   # convert back to unsigned int
415    return ciphertext
416 
417  def init():  # called once per rerequest interval.
418    global iv, x, n, n_xor_y, obfuscated_tracker_peer_list
419    iv = rand(20)
420    rc4.key = sha(infohash + iv).digest()
421    rc4(768)                         # discard first 768
422    x = rc4(4)
423    y = rc4(4)
424    n = min(num_peers, randint(MAX_PEERS * 2, MAX_PEERS * 4))
425    n_xor_y = xor(n,y)
426    pseudo = rc4(n*6)
427    obfuscated_tracker_peer_list = xor(tracker_peer_list,pseudo)
428 
429  def getpeers( numwant ):
430    global iv, x, n, n_xor_y, obfuscated_tracker_peer_list
431    response = {}
432    response['iv'] = iv
433    numwant = min(numwant, MAX_PEERS)
434    if numwant > num_peers:
435      response['peers'] = obfuscated_tracker_peer_list
436      return response
437    i = randint(0,num_peers)
438    response['i'] = xor(i,x)
439    response['n'] = n_xor_y
440    response['peers'] = obfuscated_tracker_peer_list[i*6:(i+numwant)*6]
441    if len(response['peers']) < numwant * 6:
442      r = numwant - len(response['peers']) / 6
443      response['peers'] = response['peers'] + obfuscated_tracker_peer_list[:r]
444    return response
445 
446  init()
447  print getpeers(20)
448
449
450..
451   Local Variables:
452   mode: indented-text
453   indent-tabs-mode: nil
454   sentence-end-double-space: t
455   fill-column: 70
456   coding: utf-8
457   End:
Note: See TracBrowser for help on using the browser.