root / dotorg / v4 / html / beps / bep_0008.rst

Revision 10762, 19.8 kB (checked in by dave, 9 months ago)

correct some errors.

Line 
1BEP: 8
2Title: Tracker Peer Obfuscation
3Version: $Revision$
4Last-Modified: $Date$
5Author:  David Harrison <dave@bittorrent.com>, Anthony Ciani <tony@ciani.phy.uic.edu>, Arvid Norberg <arvid@bittorrent.com>, Greg Hazel <greg@bittorrent.com>
6Status:  Draft
7Type:    Standards Track
8Created: 31-Jan-2008
9Post-History:
10
11This extends the tracker protocol to support simple obfuscation of the
12peers it returns, using the infohash as a shared secret between the
13peer and the tracker. The obfuscation does not provide any security
14against eavesdroppers that know the infohash of the torrent.  The goal
15is to prevent internet service providers and other network
16administrators from blocking or disrupting bittorrent traffic
17connections that span between the receiver of a tracker response and
18any peer IP-port appearing in that tracker response.
19
20The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
21"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are
22to be interpreted as described in IETF `RFC 2119`_.
23
24
25Announce Parameter
26==================
27
28When using this extension, instead of passing the ``info_hash`` parameter
29to the tracker, an ``sha_ih`` is passed.
30
31The value of ``sha_ih`` MUST be the info-hash of the torrent, with a second
32SHA-1 applied to it.
33
34For example if a torrent has infohash with hex representation
35``aaf4c61ddcc5e8a2dabedef3b482cd9aea9434d`` then its ``sha_ih`` is
36``sha1(infohash)='6b4f89a54e2d27ecd7e8da5b4ab8fd9d1d8b119'``.
37
38The value MUST be url encoded, just like the ``info_hash``.  Thus the
39``sha_ih`` above when url encoded becomes
40``kO%89%A5N-%27%EC%D7%E8%DA%05%B4%AB%8F%D9%D1%D8%B1%19``.
41
42This extension does not change the semantics of any parameter passed
43in the peer's announce.
44
45Announce Response
46=================
47
48If the tracker supports this extension, the response should be exactly the
49same as if the ``info_hash`` was passed, except that any field that contains
50peer information (such as ``peers``, ``peers6`` or any other field defined
51by another extension) MUST be obfuscated as described in the next section.
52
53There are additional parameters the tracker may OPTIONALLY return.
54These are discussed in the optimizations_ section.
55
56Peer List Obfuscation
57=====================
58
59We distinguish between the *tracker peer list* and the *returned peer
60list*.  The *tracker peer list* contains the ip-port pairs of all
61known peers in a given torrent, i.e., those peers that have reported to
62the tracker that they are downloading or seeding a given file with a
63given infohash.  The tracker may store this peer list however it
64wishes.  The *returned peer list* contains a packed array of ip-port
65pairs conforming to the BitTorrent protocol specification.  If the
66swarm is sufficiently large then the returned ip-port pairs constitute
67a subset of the ip-port pairs in the *tracker peer list*.
68
69The returned peer list is encrypted using RC4-drop768 encryption using
70the infohash as a shared secret and optionally employing an
71initialization vector.
72
73For the remainder of this document RC4 refers to RC4-drop768.  In the
74process of encryption, RC4 generates a pseudorandom string that is
75XOR'd with the plaintext to generate the ciphertext.  The receiver
76recovers the plaintext by generating the same pseudorandom string and
77XOR'ing it with the ciphertext.  In generating the pseudorandom
78string, the tracker and client MUST discard the first 768 bytes.  The
79next 8 bytes in the pseudorandom string are reserved for optimizations
80discussed in the next section.
81
82To communicate an initialization vector, the tracker includes in the
83bencoded response the key ``iv`` with value set to a byte string
84containing the initialization vector.  The initialization vector can
85be of arbitrary length and is sent in plaintext.
86
87If the tracker sends no initialization vector then the first 64-bits
88of the infohash are used as the RC4 key.  If the tracker provides an
89initialization vector then the RC4 key is generated by appending the
90vector to the infohash and then hashing with SHA-1.  The first 64 bits
91of the resulting hash are then used as the RC4 key.  The string from
92which the RC4 key is derived whether it be the infohash or the SHA-1 of
93the initialization vector appended to the infohash is called the
94*intermediate string*.
95
96For example, given infohash ``aaf4c61ddcc5e8a2dabedef3b482cd9aea9434d``
97and initialization vector ``abcd`` both represented in hex, the RC4 key
98is derived as follows:
99
100::
101 
102   intermediate = sha1( 'aaf4c61ddcc5e8a2dabedef3b482cd9aea9434dabcd' )
103   key = intermediate[0:64]
104
105where [i:j] denotes the ith through *jth* bit including the *ith* but
106excluding the *jth*.  The resulting key in hex is ``f36e9cae87cf33e0``.
107
108A 64-bit key is used to avoid U.S. export restrictions.
109
110It is RECOMMENDED that the tracker use the initialization vector, and
111that it change the ``iv`` on roughly the same period as the rerequest
112interval.  The reasoning for this is contained in the rationale.
113
114
115Optimizations
116=============
117
118The described optimizations are OPTIONAL for the tracker, but the
119corresponding client-side MUST be implemented by clients that support
120this extension.  These optimizations hobble the strength of the RC4
121encryption in order to improve tracker performance.  In the rationale_
122section we discuss why hobbling RC4 is reasonable and in many cases
123has negligible foreseen effect on security.
124
125For the purpose of these optimizations we assume that the tracker
126stores the tracker peer list for each infohash as a packed array that
127can be copied directly into the response.  We further assume that the
128packed array is reused many times and that with each request the
129tracker either returns the entire packed array or copies a single
130contiguous substring from the tracker peer list into the response.
131
132If the peerlist is represented and used as assumed then to improve
133randomness in the set of peers handed out by the tracker, it is
134RECOMMENDED that the tracker periodically reshuffle the peerlist with
135period similar to the rerequest interval.  After each reshuffle the
136tracker reperforms the operations described in this section.
137
138To reduce computation the tracker MAY cache the pseudorandom string
139generated by RC4 and reuse it as peers arrive and depart.
140
141The tracker MAY also cache the encrypted tracker peer list.  To
142support this the tracker MUST pass two additional keys *i* and *n*
143each with 32-bit integer values, except the tracker MAY omit *i* and
144*n* when *i=0* and the *returned peer list* is the entire *tracker peer
145list*.  Whether the tracker returns *i* and *n*, the first 8 bytes of
146the RC4 psuedorandom string are reserved for obscuring *i* and *n*.
147We come back to this momentarily.  Decryption starts by XORing from
148*6i* bytes for ipv4 (or *18i* for ipv6) into the pseudorandom string
149after the discarded and reserved bytes.  Assuming that the tracker
150encrypted the tracker peer list starting from the first byte after the
151discarded and reserved bytes in the pseudorandom string then *i* also
152corresponds to the *ith* ip-port pair in the tracker peer list.
153
154So that the client and the tracker do not have to generate an
155arbitrarily long pseudorandom string to support large swarms, we
156assume the tracker bounds the length of the pseudorandom string and
157reports the length in ip-port pairs as the value to key *n**n*
158excludes reserved and discarded bytes.  We RECOMMEND that *n* be equal
159to the length of the tracker peer list or random but within constant
160factor of the longest peerlist returned by the tracker, whichever is
161smaller.  Thus the tracker encrypts the *jth* byte of the *ith*
162ip-port pair in an ipv4 tracker peer list by XORing with the byte
163*(6i+j)* `mod` *n* bytes into the pseudorandom string.
164
165Transmitting *i* and *n* as plaintext would significantly reduce the
166cost for an attacker to recover the pseudorandom string.  The tracker
167MUST XOR the value of *i* with the first 32 bits of the pseudorandom
168string.  The tracker then XORs *n* with the next 32 bits from the
169pseudorandom string (see Figure 1).
170
171.. figure:: bep_0008_pseudo.png
172
173   **Figure 1:** The first 768 bytes of the RC4 pseudorandom
174   string are discarded.  The key *i* in the tracker response has
175   value ``x xor i``.  The key *n* has value ``y xor n``.
176
177We describe encryption in the following example for an ipv4 tracker peer
178list consisting of 3 ip-port pairs, and using an RC4 pseudorandom string
179of length *n=2*. *n* is small for purposes of illustration.  Also, for the
180purpose of illustration, the tracker returns only 2 peers at a time.
181
182::
183
184  Given the following peer list
185  (208.72.193.86, 6881), (209.81.173.15,14321), (128.213.6.8, 6881)
186
187  As a packed array represented in hex it becomes
188               
189  d048c1561ae1d151ad0f37f180d506081ae1
190
191  which we XOR with an RC4 pseudorandom string excluding discarded and
192  reserved bytes, e.g.,
193
194  a496e5f9b83e835013d42226
195
196  to generate
197
198  74de24afa2df5201bedb15d72443e3f1a2df
199
200Because the RC4 pseudorandom string is shorter than the tracker
201peer list, we wrap to the beginning of the pseudorandom string.
202
203A tracker returning the first two peers would return the bencoded
204equivalent of::
205
206  peers=74de24afa2df5201bedb15d7, i=0, n=2
207
208A tracker returning the second and third peer would return the
209bencoded equivalent of::
210
211  peers=5201bedb15d72443e3f1a2df, i=1, n=2
212
213In each response the tracker includes additional keys such as the
214rerequest ``interval`` and the initialization vector ``iv``.
215
216The tracker response MUST remain a valid bencoded message.
217
218
219Backwards Compatibility
220=======================
221
222Trackers that support obfuscation are identified in the .torrent file
223by the inclusion of an ``obfuscate-announce-list`` which otherwise has the
224same semantics as the ``announce-list`` key.  Peers that do not support
225obfuscation simply ignore the ``obfuscate-announce-list``
226
227A client that is configured to use this extension should always send
228the ``sha_ih`` to any tracker supporting obfuscation.  The client
229SHOULD only contact trackers in the ``announce-list`` once the client
230has attempted all trackers in the ``obfuscate-announce-list`` and all failed.
231
232If a tracker that supports obfuscation wishes to allow legacy peers to
233connect to the tracker then the announce URL should appear in both the
234``obfuscate-announce-list`` and the ``announce-list``.
235
236If a tracker URL appears in both lists running on the same port, and
237the tracker failed to respond when selected from the
238``obfuscate-announce-list`` then the client MAY treat the tracker in
239the ``announce-list`` as if it were temporarily unreachable and defer
240trying it until it has tried other trackers in the ``announce-list``.
241
242Peers MUST never send both the ``info_hash`` and ``sha_ih`` parameters
243in the same request, since that would defeat the purpose of the shared
244secret.
245
246Any peer that requests with a ``sha_ih`` SHOULD implement Message
247Stream Encryption (MSE) [#MSE]_.  Any peer returned from the tracker
248in response to a request with a ``sha_ih`` SHOULD be assumed to
249support Message Stream Encryption.  We include these provisions
250because if a peer communicates with another peer without using MSE
251then the BitTorrent protocol is trivially identified from the first
252twenty bytes of the BitTorrent header and the ``info_hash`` appears in
253plaintext as the next twenty bytes, hence also defeating the purpose
254of the shared secret.
255
256If the tracker does not know enough peers assumed to support MSE to
257return the desired number of peers then it MAY include peers that are
258not assumed to support MSE.  If a peer closes a connection in response
259to an encrypted header then the initiating peer SHOULD try other peers
260in the peer list returning to the peer that closed the connection only
261when all other peers known or not yet known to support MSE have been
262tried and have failed to provide "adequate performance."  We
263intentionally omit any definition of "adequate performance."
264
265
266Rationale
267=========
268
269This extension directly addresses a known attack on the BitTorrent
270protocol performed by some deployed network hardware.  By obscuring
271the ip-port pairs network hardware can no longer easily identify
272ip-port pairs that are running BitTorrent by observing peer-to-tracker
273communications.  This deployed hardware under some conditions disrupts
274BitTorrent connections by injecting forged TCP reset packets.  Once a
275BitTorrent connection has been identified, other attacks could be
276performed such as severely rate limiting or blocking these
277connections.
278
279This hardware was presumably deployed to get around BitTorrent
280Message Stream Encryption [#MSE]_.  Peers implementing BitTorrent Message Stream
281Encryption obfuscate peer-to-peer connections by employing RC4
282encryption on every byte from the first byte transferred. BitTorrent
283Message Stream Encryption thus increases the difficulty for a device
284observing passing packets to identify BitTorrent peer-to-peer
285connections.
286
287By using the SHA-1 of the infohash, the tracker is able to identify
288torrents without sending the plaintext infohash and without requiring
289an additional prior exchange of a shared secret.  Where trackers now
290maintain mappings from infohash to the corresponding torrent's
291peerlist and other torrent-specific state, obfuscated trackers would need
292one additional mapping from ``sha_ih`` to the torrent's state.
293Tracker may also store encrypted versions of each torrent's peer list,
294to increase computation performance at the expense of increasing
295memory footprint by a constant factor.
296
297The obfuscation method meets the following criteria:
298
299- The entire plaintext of the peer list is not easily obtained even if
300  an eavesdropper identifies ip-port pairs from subsequent connections
301  initiated by a peer that has received a tracker response.
302
303- Even when a subsequent connection from a peer that has received a
304  tracker response is observed by an eavesdropper, it is difficult to
305  map the ip-port pair to specific ciphertext to verify that the
306  connection is using BitTorrent.
307
308When the optimizations_ are used,
309 
310- Few computations are performed at request time.
311
312- Encryption may be performed at the time a peer is added.
313  The encrypted peer ip and port may be handed out hundreds of times.
314
315- Security is minimally impacted.
316
317The objective is NOT to create a cryptographically secure protocol
318that can survive unlimited observation of passing packets and
319substantial computational resources on network timescales.  The objective
320is to raise the bar sufficiently to deter attacks based on observing
321ip-port numbers in peer-to-tracker communications.
322
323If a tracker observes a large number of tracker requests and responses
324and subsequent connections, it is possible to attack the encryption.
325RC4 is known to have a number of weaknesses especially in the way it
326was used with WEP [#Borisov]_ [#Scott]_ [#Stubblefeld]_.  However,
327with tracker peer obfuscation, the number of bytes transferred between
328the tracker and a client is likely significantly smaller than transferred
329between a wireless computer and a basestation.  An attacker faces a
330much larger task in obtaining sufficient probable plaintext to
331directly break the encryption.
332
333Hobbling the RC4 encryption by using a bounded-length RC4 pseudorandom
334string for small swarms is likely to have negilgible impact on
335security over any other encyption method since the pseudorandom string
336is probably equal to or longer than the plaintext and thus no part of
337it is repeated in the XOR except as peers arrive or leave the swarm.
338Thus on the timescales of rerequest intervals, nearly the same
339ciphertext is handed to every peer requesting the same infohash.
340Intercepting the same ciphertext multiple times provides no additional
341information to the attacker.  The attacker could correlate ip-port
342pairs in connections following tracker responses, but an attacker
343could do this regardless of the encryption method employed.
344Furthermore more direct methods of traffic analysis applied to
345peer-to-peer communication is available to network operators.
346
347For larger swarms, hobbling RC4 may more significantly impact breaking
348the encryption since the same pseudorandom string is used repeatedly
349across the peer list.  Some study is in order on this point taking
350into account that the tracker can periodically change intiailization
351vectors.
352
353We know from experience that periodically reshuffling peer lists on
354the order of the rerequest interval negligibly impacts tracker
355performance even with swarms containing millions of peers.  Generating
356a new pseudorandom string using RC4 on this same time interval is
357likely to incur negligible performance penalty because 1) RC4 is a
358small constant factor more expensive than a shuffle on an input string
359of equal length, 2) the generated pseudorandom string is only *n*
360ip-port pairs long where recommended *n* is within a small constant
361factor larger than the largest *returned peer list* and thus much
362smaller than the *tracker peer list* for large swarms, and 3) the cost
363of the XOR operation is lighter weight than performing a random
364shuffle.
365
366
367References
368==========
369
370.. _`RFC 2119`: http://tools.ietf.org/html/rfc2119
371
372.. [#MSE] BitTorrent Message Stream Encryption
373   (http://www.azureuswiki.com/index.php/Message_Stream_Encryption)
374
375.. [#Borisov] Nikita Borisov, Ian Goldberg, and David Wagner. Intercepting
376   mobile communications: the insecurity of 802.11. In ACM MobiCom 2001,
377   pages 180-189. ACM Press, 2001.
378
379.. [#Scott] Scott R. Fluhrer, Itsik Mantin, and Adi
380   Shamir. Weaknesses in the key scheduling algorithm of RC4. In Serge
381   Vaudenay and Amr M. Youssef, editors, Selected Areas in
382   Cryptography 2001, volume 2259 of Lecture Notes in Computer
383   Science, pages 1-24. Springer, 2001.
384
385.. [#Stubblefeld] Adam Stubblefeld, John Ioannidis, and Aviel
386   D. Rubin. A key recovery attack on the 802.11b wired equivalent
387   privacy protocol (WEP). ACM Transactions on Information and System
388   Security, 7(2):319-332, May 2004.
389
390
391Example Python Code
392===================
393
394Request handling in a dummy tracker implementing tracker peer obfuscation::
395
396  from sha import sha
397  from random import randint
398  from struct import unpack
399  from rc4 import rc4  # rc4(k) generates k RC4 pseudorandom bytes.
400 
401  rand = open("/dev/random","r").read
402  rc4 = rc4()
403 
404  # tracker configuration
405  MAX_PEERS = 100
406 
407  # per torrent state.
408  infohash = sha("dummy_info").digest()
409  pseudo = ''                        # pseudorandom RC4 string.
410  num_peers = 1000                   # current swarm size.
411  tracker_peer_list = rand(6) * num_peers
412  obfuscated_tracker_peer_list = ''
413 
414  def xor(plaintext,pseudo):
415    isint = False
416    if type(plaintext) == int: # convert to byte string.
417      plaintext = "".join([chr(int(x,16)) for x in "%.4x" % plaintext])
418      isint = True
419    n = len(pseudo)
420    ciphertext = "".join(
421      [chr(ord(pseudo[i%n])^ord(plaintext[i])) for i in xrange(len(plaintext))])
422    if isint:
423      ciphertext = unpack("!I", ciphertext)[0]   # convert back to unsigned int
424    return ciphertext
425 
426  def init():  # called once per rerequest interval.
427    global iv, x, n, n_xor_y, obfuscated_tracker_peer_list
428    iv = rand(20)
429    rc4.key = sha(infohash + iv).digest()[0:8]
430    rc4(768)                         # discard first 768
431    x = rc4(4)
432    y = rc4(4)
433    n = min(num_peers, randint(MAX_PEERS * 2, MAX_PEERS * 4))
434    n_xor_y = xor(n,y)
435    pseudo = rc4(n*6)
436    obfuscated_tracker_peer_list = xor(tracker_peer_list,pseudo)
437 
438  def getpeers( numwant ):
439    global iv, x, n, n_xor_y, obfuscated_tracker_peer_list
440    response = {}
441    response['iv'] = iv
442    numwant = min(numwant, MAX_PEERS)
443    if numwant > num_peers:
444      response['peers'] = obfuscated_tracker_peer_list
445      return response
446    i = randint(0,num_peers)
447    response['i'] = xor(i,x)
448    response['n'] = n_xor_y
449    response['peers'] = obfuscated_tracker_peer_list[i*6:(i+numwant)*6]
450    if len(response['peers']) < numwant * 6:
451      r = numwant - len(response['peers']) / 6
452      response['peers'] = response['peers'] + obfuscated_tracker_peer_list[:r]
453    return response
454 
455  init()
456  print getpeers(20)
457
458
459..
460   Local Variables:
461   mode: indented-text
462   indent-tabs-mode: nil
463   sentence-end-double-space: t
464   fill-column: 70
465   coding: utf-8
466   End:
Note: See TracBrowser for help on using the browser.