root/dotorg/trunk/html/beps/bep_0008.rst @ 11049

Revision 11049, 20.0 KB (checked in by dave, 19 months ago)

Move tracker peer obfuscation from draft to deferred status.

RevLine 
[10528]1BEP: 8
[10505]2Title: Tracker Peer Obfuscation
[10528]3Version: $Revision$
4Last-Modified: $Date$
[10505]5Author:  David Harrison <dave@bittorrent.com>, Anthony Ciani <tony@ciani.phy.uic.edu>, Arvid Norberg <arvid@bittorrent.com>, Greg Hazel <greg@bittorrent.com>
[11049]6Status:  Deferred
[10505]7Type:    Standards Track
8Created: 31-Jan-2008
9Post-History:
[9850]10
[10410]11This extends the tracker protocol to support simple obfuscation of the
[10560]12peers it returns, using the infohash as a shared secret between the
[10410]13peer and the tracker. The obfuscation does not provide any security
14against eavesdroppers that know the infohash of the torrent.  The goal
15is to prevent internet service providers and other network
16administrators from blocking or disrupting bittorrent traffic
17connections that span between the receiver of a tracker response and
18any peer IP-port appearing in that tracker response.
[9850]19
20The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
21"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are
22to be interpreted as described in IETF `RFC 2119`_.
23
24
[10894]25Announce
26========
[9850]27
28When using this extension, instead of passing the ``info_hash`` parameter
[10807]29to the tracker, a ``sha_ih`` is passed.
[9850]30
[10410]31The value of ``sha_ih`` MUST be the info-hash of the torrent, with a second
32SHA-1 applied to it.
[9850]33
[10410]34For example if a torrent has infohash with hex representation
35``aaf4c61ddcc5e8a2dabedef3b482cd9aea9434d`` then its ``sha_ih`` is
36``sha1(infohash)='6b4f89a54e2d27ecd7e8da5b4ab8fd9d1d8b119'``.
[9850]37
[10410]38The value MUST be url encoded, just like the ``info_hash``.  Thus the
[10463]39``sha_ih`` above when url encoded becomes
[10410]40``kO%89%A5N-%27%EC%D7%E8%DA%05%B4%AB%8F%D9%D1%D8%B1%19``.
[9850]41
[10891]42If the ``sha_ih`` is passed then the value for the ``port`` parameter
[10894]43should be treated as a 16 bit integer and MUST be obscured as
44described in the `Obfuscation Method`_ section.  Similarly if the
45optional ``ip`` parameter is passed in the announce then its value
46MUST also be so obscured.
[10891]47
[10410]48This extension does not change the semantics of any parameter passed
49in the peer's announce.
50
[10560]51Announce Response
[10528]52=================
[9850]53
[10807]54If the tracker supports this extension, the response should be exactly
55the same as if the ``info_hash`` had been passed, except that any
56field that contains peer information (such as ``peers``, ``peers6`` or
57any other field defined by another extension) MUST be obfuscated as
58described in the next section.
[9850]59
[10410]60There are additional parameters the tracker may OPTIONALLY return.
61These are discussed in the optimizations_ section.
[9850]62
[10891]63Obfuscation Method
64==================
[10410]65
[10894]66The values for the ``ip`` and ``port`` announce parameters, the
[10891]67*returned peer list* and any other values that contain peer
68information are obscured using the method described in this section.
69
[10410]70We distinguish between the *tracker peer list* and the *returned peer
71list*.  The *tracker peer list* contains the ip-port pairs of all
[10807]72known peers in a given torrent, i.e., those peers that have reported
73to the tracker that they are transferring the file with a given
74infohash.  The tracker may store this peer list however it wishes.
75The *returned peer list* contains a packed array of ip-port pairs
76conforming to the BitTorrent protocol specification.  If the swarm is
77sufficiently large then the returned ip-port pairs constitute a subset
78of the ip-port pairs in the *tracker peer list*.
[10410]79
[10891]80When a parameter is obscured, it is encrypted using RC4-drop768
81encryption using the infohash as a shared secret and optionally
82employing an initialization vector. 
[10467]83
84For the remainder of this document RC4 refers to RC4-drop768.  In the
85process of encryption, RC4 generates a pseudorandom string that is
86XOR'd with the plaintext to generate the ciphertext.  The receiver
87recovers the plaintext by generating the same pseudorandom string and
88XOR'ing it with the ciphertext.  In generating the pseudorandom
[10597]89string, the tracker and client MUST discard the first 768 bytes.  The
90next 8 bytes in the pseudorandom string are reserved for optimizations
91discussed in the next section.
[10467]92
[10464]93To communicate an initialization vector, the tracker includes in the
[10891]94bencoded response the parameter ``iv`` with value set to a byte string
[10464]95containing the initialization vector.  The initialization vector can
[10891]96be of arbitrary length and is sent in plaintext.  Initialization
97vectors can only be applied to parameters in tracker responses and NOT
98to announces.
[10410]99
[10808]100If the tracker sends no initialization vector then the infohash is
101used as the RC4 key (160 bit key).  If the tracker provides an
[10410]102initialization vector then the RC4 key is generated by appending the
[10808]103vector to the infohash and then hashing with SHA-1.  The resulting
[10814]104hash is then used as the RC4 key. 
[10410]105
106For example, given infohash ``aaf4c61ddcc5e8a2dabedef3b482cd9aea9434d``
107and initialization vector ``abcd`` both represented in hex, the RC4 key
108is derived as follows:
109
110::
111 
[10808]112   key = sha1( 'aaf4c61ddcc5e8a2dabedef3b482cd9aea9434dabcd' )
[10410]113
[10832]114The resulting key in hex is ``f36e9cae87cf33e07645ef5ca745a8a83469f31e``.
[10410]115
[10463]116It is RECOMMENDED that the tracker use the initialization vector, and
117that it change the ``iv`` on roughly the same period as the rerequest
118interval.  The reasoning for this is contained in the rationale.
119
[10579]120
[10560]121Optimizations
[10528]122=============
[10410]123
124The described optimizations are OPTIONAL for the tracker, but the
[10560]125corresponding client-side MUST be implemented by clients that support
126this extension.  These optimizations hobble the strength of the RC4
[10410]127encryption in order to improve tracker performance.  In the rationale_
128section we discuss why hobbling RC4 is reasonable and in many cases
129has negligible foreseen effect on security.
130
131For the purpose of these optimizations we assume that the tracker
132stores the tracker peer list for each infohash as a packed array that
133can be copied directly into the response.  We further assume that the
134packed array is reused many times and that with each request the
135tracker either returns the entire packed array or copies a single
136contiguous substring from the tracker peer list into the response.
137
138If the peerlist is represented and used as assumed then to improve
139randomness in the set of peers handed out by the tracker, it is
140RECOMMENDED that the tracker periodically reshuffle the peerlist with
141period similar to the rerequest interval.  After each reshuffle the
142tracker reperforms the operations described in this section.
143
[10597]144To reduce computation the tracker MAY cache the pseudorandom string
[10467]145generated by RC4 and reuse it as peers arrive and depart.
[10410]146
147The tracker MAY also cache the encrypted tracker peer list.  To
[10891]148support this the tracker MUST pass two additional parameters *i* and *n*
[10762]149each with 32-bit integer values, except the tracker MAY omit *i* and
150*n* when *i=0* and the *returned peer list* is the entire *tracker peer
151list*.  Whether the tracker returns *i* and *n*, the first 8 bytes of
[10597]152the RC4 psuedorandom string are reserved for obscuring *i* and *n*.
153We come back to this momentarily.  Decryption starts by XORing from
154*6i* bytes for ipv4 (or *18i* for ipv6) into the pseudorandom string
155after the discarded and reserved bytes.  Assuming that the tracker
156encrypted the tracker peer list starting from the first byte after the
157discarded and reserved bytes in the pseudorandom string then *i* also
[10762]158corresponds to the *ith* ip-port pair in the tracker peer list.
[10410]159
160So that the client and the tracker do not have to generate an
161arbitrarily long pseudorandom string to support large swarms, we
162assume the tracker bounds the length of the pseudorandom string and
[10891]163reports the length in ip-port pairs as the value to parameter *n*.  *n*
[10560]164excludes reserved and discarded bytes.  We RECOMMEND that *n* be equal
165to the length of the tracker peer list or random but within constant
166factor of the longest peerlist returned by the tracker, whichever is
167smaller.  Thus the tracker encrypts the *jth* byte of the *ith*
168ip-port pair in an ipv4 tracker peer list by XORing with the byte
169*(6i+j)* `mod` *n* bytes into the pseudorandom string.
[10410]170
[10560]171Transmitting *i* and *n* as plaintext would significantly reduce the
172cost for an attacker to recover the pseudorandom string.  The tracker
173MUST XOR the value of *i* with the first 32 bits of the pseudorandom
174string.  The tracker then XORs *n* with the next 32 bits from the
175pseudorandom string (see Figure 1).
176
177.. figure:: bep_0008_pseudo.png
178
179   **Figure 1:** The first 768 bytes of the RC4 pseudorandom
[10891]180   string are discarded.  The parameter *i* in the tracker response has
181   value ``x xor i``.  The parameter *n* has value ``y xor n``.
[10560]182
[10463]183We describe encryption in the following example for an ipv4 tracker peer
184list consisting of 3 ip-port pairs, and using an RC4 pseudorandom string
185of length *n=2*. *n* is small for purposes of illustration.  Also, for the
[10762]186purpose of illustration, the tracker returns only 2 peers at a time.
[10410]187
188::
[10560]189
[10410]190  Given the following peer list
191  (208.72.193.86, 6881), (209.81.173.15,14321), (128.213.6.8, 6881)
192
193  As a packed array represented in hex it becomes
194               
195  d048c1561ae1d151ad0f37f180d506081ae1
196
[10560]197  which we XOR with an RC4 pseudorandom string excluding discarded and
198  reserved bytes, e.g.,
[10410]199
200  a496e5f9b83e835013d42226
201
202  to generate
203
204  74de24afa2df5201bedb15d72443e3f1a2df
205
[10463]206Because the RC4 pseudorandom string is shorter than the tracker
[10410]207peer list, we wrap to the beginning of the pseudorandom string.
208
[10762]209A tracker returning the first two peers would return the bencoded
210equivalent of::
[10463]211
[10560]212  peers=74de24afa2df5201bedb15d7, i=0, n=2
[10463]213
[10762]214A tracker returning the second and third peer would return the
215bencoded equivalent of::
[10463]216
[10560]217  peers=5201bedb15d72443e3f1a2df, i=1, n=2
[10463]218
[10810]219In each response the tracker includes additional parameters such as
220the rerequest ``interval`` and the initialization vector ``iv``.
[10762]221
[9850]222The tracker response MUST remain a valid bencoded message.
223
[10560]224
225Backwards Compatibility
[10528]226=======================
[9850]227
[10165]228Trackers that support obfuscation are identified in the .torrent file
[10410]229by the inclusion of an ``obfuscate-announce-list`` which otherwise has the
[10891]230same semantics as the ``announce-list`` parameter.  Peers that do not support
[10410]231obfuscation simply ignore the ``obfuscate-announce-list``. 
[10165]232
[10410]233A client that is configured to use this extension should always send
234the ``sha_ih`` to any tracker supporting obfuscation.  The client
235SHOULD only contact trackers in the ``announce-list`` once the client
236has attempted all trackers in the ``obfuscate-announce-list`` and all failed.
[9850]237
[10410]238If a tracker that supports obfuscation wishes to allow legacy peers to
239connect to the tracker then the announce URL should appear in both the
240``obfuscate-announce-list`` and the ``announce-list``.
[10165]241
[10410]242If a tracker URL appears in both lists running on the same port, and
243the tracker failed to respond when selected from the
244``obfuscate-announce-list`` then the client MAY treat the tracker in
245the ``announce-list`` as if it were temporarily unreachable and defer
246trying it until it has tried other trackers in the ``announce-list``.
247
248Peers MUST never send both the ``info_hash`` and ``sha_ih`` parameters
249in the same request, since that would defeat the purpose of the shared
250secret.
251
[10560]252Any peer that requests with a ``sha_ih`` SHOULD implement Message
253Stream Encryption (MSE) [#MSE]_.  Any peer returned from the tracker
254in response to a request with a ``sha_ih`` SHOULD be assumed to
255support Message Stream Encryption.  We include these provisions
256because if a peer communicates with another peer without using MSE
257then the BitTorrent protocol is trivially identified from the first
258twenty bytes of the BitTorrent header and the ``info_hash`` appears in
259plaintext as the next twenty bytes, hence also defeating the purpose
260of the shared secret.
[10491]261
[10762]262If the tracker does not know enough peers assumed to support MSE to
263return the desired number of peers then it MAY include peers that are
264not assumed to support MSE.  If a peer closes a connection in response
[10808]265to an encrypted header then the initiating peer SHOULD assume that the
266peer does not support MSE.  The initiating peer however SHOULD ONLY
267initiate unencrypted connections when all peers have been tried and
268those that support MSE fail to provide "adequate performance."  We
[10560]269intentionally omit any definition of "adequate performance."
270
271
272Rationale
[10528]273=========
[9850]274
[10410]275This extension directly addresses a known attack on the BitTorrent
276protocol performed by some deployed network hardware.  By obscuring
277the ip-port pairs network hardware can no longer easily identify
278ip-port pairs that are running BitTorrent by observing peer-to-tracker
279communications.  This deployed hardware under some conditions disrupts
[10836]280BitTorrent connections by injecting forged TCP reset packets. 
[9850]281
[10560]282This hardware was presumably deployed to get around BitTorrent
283Message Stream Encryption [#MSE]_.  Peers implementing BitTorrent Message Stream
[10491]284Encryption obfuscate peer-to-peer connections by employing RC4
285encryption on every byte from the first byte transferred. BitTorrent
[10560]286Message Stream Encryption thus increases the difficulty for a device
[10491]287observing passing packets to identify BitTorrent peer-to-peer
288connections.
[10165]289
[10410]290By using the SHA-1 of the infohash, the tracker is able to identify
291torrents without sending the plaintext infohash and without requiring
292an additional prior exchange of a shared secret.  Where trackers now
293maintain mappings from infohash to the corresponding torrent's
[10815]294peerlist and other torrent-specific state, obfuscated trackers would
295need one additional mapping from ``sha_ih`` to the torrent's state.
296Trackers may also cache the encrypted version of each torrent's
297tracker peer list, to increase computational performance at the
298expense of increasing memory footprint by a constant factor.
[10165]299
[10410]300The obfuscation method meets the following criteria:
301
[10560]302- The entire plaintext of the peer list is not easily obtained even if
[10891]303  an eavesdropper identifies one or more subsequent connections as
304  using BitTorrent and the corresponding ip-port pairs appeared in the
305  ciphertext of the tracker response.
[10410]306
307- Even when a subsequent connection from a peer that has received a
308  tracker response is observed by an eavesdropper, it is difficult to
309  map the ip-port pair to specific ciphertext to verify that the
310  connection is using BitTorrent.
311
312When the optimizations_ are used,
313 
314- Few computations are performed at request time.
315
316- Encryption may be performed at the time a peer is added.
317  The encrypted peer ip and port may be handed out hundreds of times.
318
319- Security is minimally impacted.
320
321The objective is NOT to create a cryptographically secure protocol
322that can survive unlimited observation of passing packets and
[10560]323substantial computational resources on network timescales.  The objective
[10410]324is to raise the bar sufficiently to deter attacks based on observing
325ip-port numbers in peer-to-tracker communications.
326
327If a tracker observes a large number of tracker requests and responses
328and subsequent connections, it is possible to attack the encryption.
329RC4 is known to have a number of weaknesses especially in the way it
[10891]330is used with WEP [#Borisov]_ [#Scott]_ [#Stubblefeld]_.  However, with
331tracker peer obfuscation, the number of bytes transferred between the
332tracker and a client is likely significantly smaller than transferred
[10560]333between a wireless computer and a basestation.  An attacker faces a
[10891]334much larger task in obtaining sufficient ciphertext to directly break
335the encryption.
[10410]336
337Hobbling the RC4 encryption by using a bounded-length RC4 pseudorandom
338string for small swarms is likely to have negilgible impact on
339security over any other encyption method since the pseudorandom string
340is probably equal to or longer than the plaintext and thus no part of
341it is repeated in the XOR except as peers arrive or leave the swarm.
342Thus on the timescales of rerequest intervals, nearly the same
343ciphertext is handed to every peer requesting the same infohash.
344Intercepting the same ciphertext multiple times provides no additional
345information to the attacker.  The attacker could correlate ip-port
346pairs in connections following tracker responses, but an attacker
347could do this regardless of the encryption method employed.
348Furthermore more direct methods of traffic analysis applied to
349peer-to-peer communication is available to network operators.
350
[10891]351For larger swarms, hobbling RC4 may simplify breaking the encryption
352since the same pseudorandom string is used repeatedly across the peer
353list.  Some study is in order taking into account that the tracker can
354periodically change intiailization vectors.
[10410]355
356We know from experience that periodically reshuffling peer lists on
357the order of the rerequest interval negligibly impacts tracker
358performance even with swarms containing millions of peers.  Generating
359a new pseudorandom string using RC4 on this same time interval is
[10466]360likely to incur negligible performance penalty because 1) RC4 is a
[10410]361small constant factor more expensive than a shuffle on an input string
[10466]362of equal length, 2) the generated pseudorandom string is only *n*
[10560]363ip-port pairs long where recommended *n* is within a small constant
364factor larger than the largest *returned peer list* and thus much
365smaller than the *tracker peer list* for large swarms, and 3) the cost
366of the XOR operation is lighter weight than performing a random
367shuffle.
[10410]368
[10596]369
[10560]370References
371==========
[10410]372
[10491]373.. _`RFC 2119`: http://tools.ietf.org/html/rfc2119
[9850]374
[10560]375.. [#MSE] BitTorrent Message Stream Encryption
376   (http://www.azureuswiki.com/index.php/Message_Stream_Encryption)
377
378.. [#Borisov] Nikita Borisov, Ian Goldberg, and David Wagner. Intercepting
379   mobile communications: the insecurity of 802.11. In ACM MobiCom 2001,
380   pages 180-189. ACM Press, 2001.
381
382.. [#Scott] Scott R. Fluhrer, Itsik Mantin, and Adi
383   Shamir. Weaknesses in the key scheduling algorithm of RC4. In Serge
384   Vaudenay and Amr M. Youssef, editors, Selected Areas in
385   Cryptography 2001, volume 2259 of Lecture Notes in Computer
386   Science, pages 1-24. Springer, 2001.
387
388.. [#Stubblefeld] Adam Stubblefeld, John Ioannidis, and Aviel
389   D. Rubin. A key recovery attack on the 802.11b wired equivalent
390   privacy protocol (WEP). ACM Transactions on Information and System
391   Security, 7(2):319-332, May 2004.
392
[10579]393
394Example Python Code
395===================
396
[10594]397Request handling in a dummy tracker implementing tracker peer obfuscation::
[10579]398
399  from sha import sha
400  from random import randint
401  from struct import unpack
402  from rc4 import rc4  # rc4(k) generates k RC4 pseudorandom bytes.
403 
404  rand = open("/dev/random","r").read
[10596]405  rc4 = rc4()
[10579]406 
407  # tracker configuration
408  MAX_PEERS = 100
409 
410  # per torrent state.
411  infohash = sha("dummy_info").digest()
412  pseudo = ''                        # pseudorandom RC4 string.
413  num_peers = 1000                   # current swarm size.
414  tracker_peer_list = rand(6) * num_peers
415  obfuscated_tracker_peer_list = ''
416 
417  def xor(plaintext,pseudo):
418    isint = False
419    if type(plaintext) == int: # convert to byte string.
420      plaintext = "".join([chr(int(x,16)) for x in "%.4x" % plaintext])
421      isint = True
422    n = len(pseudo)
423    ciphertext = "".join(
424      [chr(ord(pseudo[i%n])^ord(plaintext[i])) for i in xrange(len(plaintext))])
425    if isint:
426      ciphertext = unpack("!I", ciphertext)[0]   # convert back to unsigned int
427    return ciphertext
428 
429  def init():  # called once per rerequest interval.
430    global iv, x, n, n_xor_y, obfuscated_tracker_peer_list
431    iv = rand(20)
[10808]432    rc4.key = sha(infohash + iv).digest()
[10579]433    rc4(768)                         # discard first 768
434    x = rc4(4)
435    y = rc4(4)
436    n = min(num_peers, randint(MAX_PEERS * 2, MAX_PEERS * 4))
437    n_xor_y = xor(n,y)
438    pseudo = rc4(n*6)
439    obfuscated_tracker_peer_list = xor(tracker_peer_list,pseudo)
440 
441  def getpeers( numwant ):
442    global iv, x, n, n_xor_y, obfuscated_tracker_peer_list
443    response = {}
444    response['iv'] = iv
445    numwant = min(numwant, MAX_PEERS)
[10851]446    if numwant >= num_peers:
[10579]447      response['peers'] = obfuscated_tracker_peer_list
448      return response
[10851]449 
450    i = randint(0,num_peers-numwant)
451    response['i'] = xor(i,x)
[10579]452    response['n'] = n_xor_y
[10851]453    # peers at end of tracker peer list have lower probability of being picked,
454    # but this requires only one copy.
[10579]455    response['peers'] = obfuscated_tracker_peer_list[i*6:(i+numwant)*6]
[10851]456    return response
[10579]457 
458  init()
459  print getpeers(20)
460
[11031]461
462Copyright
463=========
464
465This document has been placed in the public domain.
466
[10528]467
468..
469   Local Variables:
470   mode: indented-text
471   indent-tabs-mode: nil
472   sentence-end-double-space: t
473   fill-column: 70
474   coding: utf-8
475   End:
Note: See TracBrowser for help on using the browser.