root/dotorg/trunk/html/beps/bep_0030.rst

Revision 11173, 10.1 KB (checked in by arvid, 7 months ago)

fixed typos to BEP 30

Line 
1BEP: 30
2Title: Merkle hash torrent extension
3Version: $Revision$
4Last-Modified: $Date$
5Author:  Arno Bakker <arno@cs.vu.nl>
6Status:  Draft
7Type:    Standards Track
8Requires: 10
9Content-Type: text/x-rst
10Created: 11-Mar-2009
11Post-History:
12
13Abstract
14========
15
16BitTorrent requires a torrent file containing a cryptographic digest of
17every piece of the content to allow the verification of pieces during the
18download. Large torrent files put a strain on the Web servers distributing
19them, and cannot be directly included in RSS feeds or gossiped around.
20
21A related problem is the use of large piece sizes. To keep the size of a
22torrent file small (as to not overload the Web servers) the number of hashes
23for a content file is being kept small. For large files this implies that the
24piece size over which digests are calculated must go up (up to 2MB pieces are
25used). The large piece sizes affect the ability of peers to barter pieces.
26Only when a piece has been completely received and verified using the digest
27may it be traded with other peers. This means that it may be some time
28before a node starts bartering with others.
29
30Our solution to these two problems is to replace the list of digests with a
31single Merkle hash [1]_.  A Merkle hash can be used to verify the integrity
32of the total content file as well as the individual blocks via a hierarchical
33scheme. It works by constructing a hash tree of the content and using just
34the root hash as data integrity protection. The simple root hash value also
35allows for smaller piece sizes to be used. A common form of hash trees is the
36Merkle hash tree, hence the name.
37
38
39Simple Merkle Hashes
40====================
41
42We propose a minimalistic design that does not affect the existing BitTorrent
43protocol and clients very much. The design is backwards compatible in the
44sense that clients supporting the Simple Merkle Hash extension can still be
45made to process regular torrent files easily.
46
47>From the content we construct a hash tree as follows. Given a piece size,
48we calculate the hashes of all the pieces in the set of content files. Next,
49we create a binary tree of sufficient height. Sufficient height means that the
50lowest level in the tree has enough nodes to hold all piece hashes in the set.
51We place all piece hashes in the tree, starting at the left-most leaf, see
52figure. The remaining leaves in the tree are assigned a *filler hash value* of
530 (see Discussion). Finally, we calculate the hash values of the higher levels
54in the tree, by concatenating the hash values of the two children (again left
55to right) and computing the hash of that aggregate. This process ends in a
56hash value for the root node, which we call the *root hash*. The hashing
57algorithm used is SHA1, as in normal torrents.
58
59The root hash along with the total size of the content-file set and the piece
60size are now the only information in the system that needs to come from a
61trusted source. A client that has only the root hash of a file set can check
62any piece as follows (see figure). It first calculates the hash of the piece
63it received. Along with this piece it should have received the hashes of the
64piece's sibling and of its *uncles*, that is the sibling Y of its parent X,
65and the uncle of that Y until the root is reached (uncles are marked with \*
66in the figure). Using this information the client recalculates the root hash
67of the tree, and compares it to the root hash it received from the trusted
68source.
69
70::
71   
72                                          0* = root hash
73                                       /     \
74                                   /            \
75                               /                   \
76                           /                          \
77                       /                                 \
78                     1*                                     2
79                    / \                                    / \
80                  /     \                                /     \
81                /         \                            /         \
82              /             \                        /             \
83            /                 \                    /                 \
84           3                   4                  5                   6* = uncle
85          / \                 / \                / \                 / \
86         /   \               /   \              /   \               /   \
87        /     \             /     \            /     \             /     \
88      7         8         9        10        11        12*       13        14
89     / \       / \       / \       / \       / \       / \       / \       / \
90   15   16   17   18   19   20   21   22   23   24   25   26   27   28   29   30
91   
92   P0   P1   P2   P3   P4   P5   P6   P7   P8*  P9*  P10  P11  P12   X    X    X
93   = piece index                            =    =                   = filler hash
94                                            p    s                   
95                                            i    i                   
96                                            e    b                   
97                                            c    l
98                                            e    i
99                                                 n
100                                                 g
101
102
103Inclusion in BitTorrent
104=======================
105
106The original publisher of the content-file set creates a so-called *Merkle
107torrent* which is a torrent file that contains a ``root hash`` key in its
108``info`` part instead of a ``pieces`` key, see BEP 3 [#BEP-3]_.
109
110When a seeder starts it uses the information in the Merkle torrent and the
111file set to reconstruct the hash tree and registers itself with the tracker
112using the hash value of the ``info`` part of the Merkle torrent, as usual
113(see Discussion).
114
115A BitTorrent client that supports the Simple Merkle Hash extension must also
116support the Extension protocol (BEP 10) [#BEP-10]_. In particular, it must add
117a  ``Tr_hashpiece`` message name in the ``m`` field of the Extension
118protocol's handshake message. Such a client must not send ``piece`` messages
119but must use Extension protocol messages with type ``Tr_hashpiece`` to send
120pieces.
121
122A ``Tr_hashpiece`` message consists of an index, begin, hashlist and piece.
123The hashlist consists of the piece's own hash, the piece's sibling hash, and
124the uncles of the piece up until and including the root hash (see above and
125Discussion). In particular, the hashlist is a list of 2-element lists. The
126first element denotes the node offset in the tree, the second element is the
127hash value. The node offset is the number of the node when numbered in a
128breadth-first fashion (i.e., going left to right starting at the top).
129
130Only the ``Tr_hashpiece`` message with begin field equal to 0 must contain a
131filled hashlist, for all other begin values the hashlist must be empty. In
132other words, the message containing the first subpiece should have a filled
133hashlist, subsequent subpieces should not.
134
135Formally, a ``Tr_hashpiece`` message has the following payload:
136
1371. 4-byte index
1382. 4-byte begin
1393. 4-byte length of bencoded hashlist
1404. the bencoded hashlist
1415. the subpiece data
142
143Upon receipt of a ``Tr_hashpiece`` message, the receiver recomputes the root
144hash using the hashlist and compares it to the root hash in the Merkle
145torrent. If they match, all the hash values are saved in the receiver's own
146hash tree, such that they can be passed on to others when the piece is
147downloaded from this receiver. When all subpieces have come in, the piece is
148checked using the hash from the hash tree.
149
150
151Discussion
152==========
153
154We chose a binary tree for simplicity. Trees with larger degrees are also
155possible. However, the number of hashes that need to be sent with each
156piece is already small at about 2log of the file-set size.
157
158Using the hash of the ``info`` part for registering at the tracker means
159that for a given content-file set, the swarm that use a conventional torrent
160file and the swarm that uses a Merkle torrent will be disjunct. The infohash
161value is different, hence the swarms are known under different identifiers at
162the trackers.
163
164In theory we can create one swarm. In that swarm, new clients could serve
165pieces to old clients. For the new clients to benefit from the old clients,
166however, we need to add some way for the new to obtain the hashes required to
167check a piece. Designing a fool proof solution for this problem is not
168trivial.
169
170Because we let the initial seeders recalculate the hash tree, this
171extension is incompatible with the proposed HTTP Seeding extensions in
172BEP 17 [#BEP-17]_ and 19 [#BEP-19]_ .
173
174Including the root hash in a ``Tr_hashpiece`` message allows a quick sanity
175check.
176
177This extension paves the way for BitTorrent URLs. The only information
178required for a client to commence sharing are the root hash, the total size,
179the piece size, and a source of peer addresses (tracker, DHT).
180
181
182Acknowledgements
183================
184
185Development of this extension was supported by funding from:
186
187 * BSIK Freeband Communication I-Share project (Dutch Ministry of Economic
188   Affairs)
189 * The European Community's Seventh Framework Programme in the P2P-Next
190   project under grant agreement no 216217.
191
192Thanks to Olaf van der Spek and Johan Pouwelse for ideas and suggestions.
193
194
195References
196==========
197
198.. [1] MERKLE, R. A Digital Signature Based on a Conventional Encryption
199   Function. In Proceedings CRYPTOâ™87 (Santa Barbara, CA, USA, Aug. 1987),
200   C. Pomerance, Ed., no. 293 in Lecture Notes in Computer Science,
201   Springer-Verlag, pp. 369â“378.
202
203.. [#BEP-3] BEP_0003. The BitTorrent Protocol Specification, Cohen
204   (http://www.bittorrent.org/beps/bep_0003.html)
205
206.. [#BEP-10] BEP_0010. Extension Protocol, Norberg, Strigeus, Hazel
207   (http://www.bittorrent.org/beps/bep_0010.html)
208
209.. [#BEP-17] BEP_0017. HTTP Seeding, Hoffman
210   (http://www.bittorrent.org/beps/bep_0017.html)
211
212.. [#BEP-19] BEP_0019. WebSeed - HTTP/FTP Seeding (GetRight style), Burford
213   (http://www.bittorrent.org/beps/bep_0019.html)
214
215
216Copyright
217=========
218
219This document has been placed in the public domain.
220
221
222
223..
224   Local Variables:
225   mode: indented-text
226   indent-tabs-mode: nil
227   sentence-end-double-space: t
228   fill-column: 70
229   coding: utf-8
230   End:
231
Note: See TracBrowser for help on using the browser.