| [11013] | 1 | BEP: 19 |
|---|
| 2 | Title: WebSeed - HTTP/FTP Seeding (GetRight style) |
|---|
| 3 | Version: $Revision$ |
|---|
| 4 | Last-Modified: $Date$ |
|---|
| 5 | Author: Michael Burford <michael @at@ getright.com> |
|---|
| 6 | Status: Draft |
|---|
| 7 | Type: Standards Track |
|---|
| 8 | Content-Type: text/x-rst |
|---|
| 9 | Created: 21-Feb-2008 |
|---|
| 10 | Post-History: |
|---|
| 11 | |
|---|
| 12 | |
|---|
| 13 | Abstract |
|---|
| 14 | ======== |
|---|
| 15 | |
|---|
| 16 | Using HTTP or FTP servers as seeds for BitTorrent downloads. |
|---|
| 17 | |
|---|
| 18 | |
|---|
| 19 | Rationale |
|---|
| 20 | ========= |
|---|
| 21 | |
|---|
| 22 | Many websites that list a BitTorrent download also provide a |
|---|
| 23 | HTTP or FTP URL for the same file. The files are identical. |
|---|
| 24 | A WebSeeding BitTorrent client can download from either source, |
|---|
| 25 | putting all the parts together into one complete file. |
|---|
| 26 | |
|---|
| 27 | The HTTP or FTP server acts as a permanently unchoked seed. |
|---|
| 28 | |
|---|
| 29 | |
|---|
| 30 | Advantages |
|---|
| 31 | ========== |
|---|
| 32 | |
|---|
| 33 | There is always an unchoked seed where anybody can get started. |
|---|
| 34 | |
|---|
| 35 | It does not break or change anything for clients that do not |
|---|
| 36 | recognize the addition to the metadata. |
|---|
| 37 | |
|---|
| 38 | Clients that do not support HTTP/FTP Seeding would still get |
|---|
| 39 | the benefit from peers sharing pieces originally from an HTTP/FTP |
|---|
| 40 | server. |
|---|
| 41 | |
|---|
| 42 | The metadata does not have to be changed. Download Manager tools |
|---|
| 43 | (such as GetRight) commonly can add multiple HTTP/FTP mirror URLs |
|---|
| 44 | for a file, usually by the user action of clicking multiple links |
|---|
| 45 | on a web page and recognizing the same file name. |
|---|
| 46 | |
|---|
| 47 | Everybody is quite familiar with HTTP/FTP servers and their |
|---|
| 48 | protocols. |
|---|
| 49 | |
|---|
| 50 | It has already been implemented in GetRight, the Mainline client, |
|---|
| 51 | uTorrent, Azureus, libTorrent, and likely others. |
|---|
| 52 | |
|---|
| 53 | Changes are only needed in BitTorrent clients. No changes to |
|---|
| 54 | trackers, HTTP, or FTP servers are needed. No scripts are |
|---|
| 55 | needed on the HTTP or FTP server. |
|---|
| 56 | |
|---|
| 57 | As this is already supported in many very common clients (notably |
|---|
| 58 | the Mainline client itself) seeding for a BitTorrent download |
|---|
| 59 | could be done entirely with a company or individual's existing |
|---|
| 60 | HTTP/FTP servers. |
|---|
| 61 | |
|---|
| 62 | |
|---|
| 63 | Metadata Extension |
|---|
| 64 | ================== |
|---|
| 65 | |
|---|
| 66 | In the main area of the metadata file and not part of the "info" |
|---|
| 67 | section, will be a new key, "url-list". This key will refer to |
|---|
| 68 | a one or more URLs, and will contain a list of web addresses where |
|---|
| 69 | torrent data can be retrieved. This key may be safely ignored |
|---|
| 70 | if the client is not capable of using it. |
|---|
| 71 | |
|---|
| 72 | For example (on multiple lines for readability): |
|---|
| 73 | d |
|---|
| 74 | 8:announce27:http://tracker.com/announce |
|---|
| 75 | 8:url-list26:http://mirror.com/file.exe |
|---|
| 76 | 4:info... |
|---|
| 77 | |
|---|
| 78 | If the "url-list" URL ends in a slash, "/" the client must add |
|---|
| 79 | the "name" from the torrent to make the full URL. This allows |
|---|
| 80 | .torrent generators to treat this field same for single file and |
|---|
| 81 | multi-file torrents. |
|---|
| 82 | |
|---|
| 83 | |
|---|
| 84 | Multi-File Torrents |
|---|
| 85 | ------------------- |
|---|
| 86 | |
|---|
| 87 | BitTorrent clients normally use the "name" from the torrent info |
|---|
| 88 | section to make a folder, then use the "path/file" items from the |
|---|
| 89 | info section within that folder. For the case of Multi-File |
|---|
| 90 | torrents, the "url-list" must be a root folder where a client |
|---|
| 91 | could add the same "name" and "path/file" to create the URL for |
|---|
| 92 | the request. |
|---|
| 93 | |
|---|
| 94 | For example: |
|---|
| 95 | ... |
|---|
| 96 | 8:url-list22:http://mirror.com/pub/ |
|---|
| 97 | 4:infod5:filesld6:lengthi949e4:pathl10:Readme.txte |
|---|
| 98 | e4:name7:michael |
|---|
| 99 | |
|---|
| 100 | A client would use all that to build a url: |
|---|
| 101 | http://mirror.com/pub/michael/Readme.txt |
|---|
| 102 | |
|---|
| 103 | |
|---|
| 104 | Client Implementation Overview |
|---|
| 105 | ============================== |
|---|
| 106 | |
|---|
| 107 | A client should ignore any protocols it does not know from the url-list. |
|---|
| 108 | A client may choose to implement HTTP but not FTP, or vice-versa. |
|---|
| 109 | |
|---|
| 110 | HTTP/FTP are "streaming" type protocols, and do not have BitTorrent's |
|---|
| 111 | concept of blocks. For HTTP you can use byte-ranges to resume |
|---|
| 112 | anywhere or download specific ranges you specify, but with FTP you |
|---|
| 113 | can only say where to start the download. |
|---|
| 114 | |
|---|
| 115 | I wanted to have many sequential blocks ("gaps") in the data |
|---|
| 116 | downloaded from BitTorrent peers so a HTTP/FTP connection would |
|---|
| 117 | have big spaces to fill in. You could use the byte-ranges with |
|---|
| 118 | HTTP to request individual blocks--but each request will show |
|---|
| 119 | in the server's logs, and somebody is going to think it is a |
|---|
| 120 | denial of service attack on them if their shows 100's of connections |
|---|
| 121 | in their log from the same IP. |
|---|
| 122 | |
|---|
| 123 | In GetRight's implementation, I made a couple changes to the usual |
|---|
| 124 | "rarest first" piece-selection method to better allow "gaps" to develop |
|---|
| 125 | between pieces. That way there are longer spaces in the file for HTTP |
|---|
| 126 | and FTP threads to fill. They can start at the beginning of a gap and |
|---|
| 127 | download until they get to the end. |
|---|
| 128 | |
|---|
| 129 | |
|---|
| 130 | Client Implementation Notes |
|---|
| 131 | =========================== |
|---|
| 132 | |
|---|
| 133 | Changes to the standard BitTorrent algorithms to optimize the ability |
|---|
| 134 | of HTTP and FTP connections to fill in many sequential blocks in the file. |
|---|
| 135 | |
|---|
| 136 | |
|---|
| 137 | Definition of Gaps |
|---|
| 138 | ------------------ |
|---|
| 139 | |
|---|
| 140 | Gaps are spaces of multiple pieces in a row that the client does not have. |
|---|
| 141 | |
|---|
| 142 | Given a bitfield "YYnnnnYnnY" where Y is pieces it has and n are ones it |
|---|
| 143 | does not have, there are two "gaps", one of 4 pieces, and another of 2. |
|---|
| 144 | |
|---|
| 145 | |
|---|
| 146 | Piece Selection |
|---|
| 147 | --------------- |
|---|
| 148 | |
|---|
| 149 | The major change is to the Piece Selection. |
|---|
| 150 | |
|---|
| 151 | If everything else is more-or-less equivalent, it is better to pick a |
|---|
| 152 | piece to do from the gap of 2 when requesting a piece from a BitTorrent peer. |
|---|
| 153 | |
|---|
| 154 | In any gap, it is best to fill in from the end (ie, the highest piece number |
|---|
| 155 | first). |
|---|
| 156 | |
|---|
| 157 | So given bitfield "YYnnnnYnnY" if the rareness of all the n pieces is similar |
|---|
| 158 | it is better to select one of the # pieces "YYnnnnY##Y" and the best |
|---|
| 159 | would be to select piece $ "YYnnnnYn$Y" |
|---|
| 160 | |
|---|
| 161 | These maximize the contiguous pieces for HTTP/FTP connections. This allows |
|---|
| 162 | a HTTP/FTP connection to start at the beginning of a large gap and download |
|---|
| 163 | the most data before it gets to a piece the client already has. |
|---|
| 164 | |
|---|
| 165 | |
|---|
| 166 | Changes to Piece Selection |
|---|
| 167 | -------------------------- |
|---|
| 168 | |
|---|
| 169 | Change the "rarest first" piece selection to a "pretty rare with biggest |
|---|
| 170 | distance from another completed piece". |
|---|
| 171 | |
|---|
| 172 | |
|---|
| 173 | Pretty Rare With Biggest Gap |
|---|
| 174 | '''''''''''''''''''''''''''' |
|---|
| 175 | |
|---|
| 176 | When scanning for the rarest piece, if the distance from another completed |
|---|
| 177 | piece is less than that for the current rarest piece, it must be "rare-X". |
|---|
| 178 | Or if the distance is greater than the current piece's, it can be rare+X |
|---|
| 179 | to be picked as the next piece. (For no better reason than it seemed to make |
|---|
| 180 | sense at the time and scale, X is the square root of the number of peers |
|---|
| 181 | minus 1.) |
|---|
| 182 | |
|---|
| 183 | So if 3 peers had the current rarest piece, the normal algorithm would |
|---|
| 184 | pick a piece where 2 peers had it. The changed algorithm would require |
|---|
| 185 | that only 1 peer has the piece if that piece's distance from a complete |
|---|
| 186 | piece was less than the gap for the current rarest piece. |
|---|
| 187 | |
|---|
| 188 | If the gap is bigger and the piece is the same "rareness" or the usual |
|---|
| 189 | "rare-1" that piece is selected. (So if the gap is bigger, a piece with |
|---|
| 190 | either 2 or 3 peers would be chosen.) |
|---|
| 191 | |
|---|
| 192 | So given "YYnnn1Yn2Y", unless 1 is a lot more rare than 2, it's better to |
|---|
| 193 | pick piece 2. |
|---|
| 194 | |
|---|
| 195 | Pseudo-Code logic: |
|---|
| 196 | |
|---|
| 197 | :: |
|---|
| [11016] | 198 | |
|---|
| [11013] | 199 | X = sqrt(Peers) - 1; |
|---|
| 200 | Gap = 0; |
|---|
| 201 | CurGap = 0; |
|---|
| 202 | CurRarest = MaxPieces+1; |
|---|
| 203 | for (i=0; i<MaxPieces; i++) { |
|---|
| 204 | if (IDoNotHavePiece(i)) { |
|---|
| 205 | Gap++; |
|---|
| 206 | if (PeerHasPiece(i)) { |
|---|
| 207 | PieceRareness = NumberOfPeersWithThePiece(); |
|---|
| 208 | if (PieceRareness<(CurRarest-X) || |
|---|
| 209 | (PieceRareness<=(CurRarest+X) && Gap>CurGap)) { |
|---|
| 210 | CurRarest = PieceRareness; |
|---|
| 211 | CurGap = Gap; |
|---|
| 212 | NextPiece = i; |
|---|
| 213 | } |
|---|
| 214 | } |
|---|
| 215 | } else { |
|---|
| 216 | Gap = 0; |
|---|
| 217 | } |
|---|
| 218 | } |
|---|
| 219 | |
|---|
| 220 | |
|---|
| 221 | Fill In The Gaps |
|---|
| 222 | '''''''''''''''' |
|---|
| 223 | |
|---|
| 224 | If a file is more than 50% complete, it uses a different method for piece |
|---|
| 225 | selection randomly. (With over 50%, you should have a large number of |
|---|
| 226 | pieces that other peers will want to download.) |
|---|
| 227 | |
|---|
| 228 | Every few pieces (in GetRight it is randomly 1 in 10), it will pick the |
|---|
| 229 | piece with the smallest gap from a complete piece, ignoring all rareness. |
|---|
| 230 | For the bitfield "YYnnnnYnnY" it would select piece # "YYnnnnYn#Y". This |
|---|
| 231 | helps fill in small gaps. |
|---|
| 232 | |
|---|
| 233 | Clients can choose whether to do this step or not, and if implementing |
|---|
| 234 | could use another percent of file completion. |
|---|
| 235 | |
|---|
| 236 | Pseudo-Code logic: |
|---|
| 237 | |
|---|
| 238 | :: |
|---|
| [11016] | 239 | |
|---|
| [11013] | 240 | Gap = 0; |
|---|
| 241 | Piece = -1; |
|---|
| 242 | CurGap = MaxPieces+1; |
|---|
| 243 | for (i=0; i<MaxPieces; i++) { |
|---|
| 244 | if (IDoNotHavePiece(i)) { |
|---|
| 245 | Gap++; |
|---|
| 246 | if (PeerHasPiece(i)) { |
|---|
| 247 | Piece = i; |
|---|
| 248 | } |
|---|
| 249 | } else {---- |
|---|
| 250 | if (Gap<CurGap && Gap>0 && Piece!=-1) { |
|---|
| 251 | CurGap = Gap; |
|---|
| 252 | NextPiece = Piece; |
|---|
| 253 | } |
|---|
| 254 | Gap = 0; |
|---|
| 255 | Piece = -1; |
|---|
| 256 | } |
|---|
| 257 | } |
|---|
| 258 | |
|---|
| 259 | |
|---|
| 260 | HTTP and FTP Optimizations |
|---|
| 261 | -------------------------- |
|---|
| 262 | |
|---|
| 263 | No changes are needed to the HTTP/FTP protocols or servers. |
|---|
| 264 | |
|---|
| 265 | If the client knows the HTTP/FTP download is part of a BitTorrent download, |
|---|
| 266 | when the very first connection is made it is better to start the HTTP/FTP |
|---|
| 267 | download somewhere randomly in the file. This way it is more likely the |
|---|
| 268 | first HTTP pieces it gets will be useful for sharing to the BitTorrent peers. |
|---|
| 269 | |
|---|
| 270 | If a BitTorrent download is already progressing when starting a HTTP/FTP |
|---|
| 271 | connection, the HTTP/FTP should start at the beginning of the biggest gap. |
|---|
| 272 | Given a bitfield "YYnnnnYnnY" it should start at #: "YY#nnnYnnY" |
|---|
| 273 | |
|---|
| 274 | If it successfully downloads a piece from a HTTP/FTP server, but the SHA |
|---|
| 275 | checksum does not match, the connection must be closed and that URL should |
|---|
| 276 | be discarded. |
|---|
| 277 | |
|---|
| 278 | A client does not need to discard a HTTP or FTP server URL if it gets |
|---|
| 279 | a "busy" reply. |
|---|
| 280 | |
|---|
| 281 | |
|---|
| 282 | Multi-File Torrents |
|---|
| 283 | =================== |
|---|
| 284 | |
|---|
| 285 | Additional selection algorithms will be needed when using HTTP/FTP servers |
|---|
| 286 | for a multi-file torrent. |
|---|
| 287 | |
|---|
| 288 | A client may select BitTorrent pieces to optimize so entire large files can |
|---|
| 289 | be downloaded from the HTTP/FTP servers. |
|---|
| 290 | |
|---|
| 291 | For torrents containing small files, several HTTP/FTP transfers may be needed |
|---|
| 292 | for one Piece. In this case, it may make more sense to do those using |
|---|
| 293 | BitTorrent. For example, if there were 100 1KB files, assuming even the worst |
|---|
| 294 | case of 32KB Pieces, it would take 100 HTTP/FTP transfers to do the files, |
|---|
| 295 | but only 4 BitTorrent piece requests. |
|---|
| 296 | |
|---|
| 297 | Giving BitTorrent piece selection a higher priority for smaller files, |
|---|
| 298 | and HTTP/FTP a higher priority for larger files would work well. |
|---|
| 299 | |
|---|
| 300 | |
|---|
| 301 | Another Possible Client Implementation |
|---|
| 302 | ====================================== |
|---|
| 303 | |
|---|
| 304 | If a client only supported HTTP and not FTP, it could take advantage of |
|---|
| 305 | HTTP's byte-range requests, but request more than one piece at a time. |
|---|
| 306 | |
|---|
| 307 | Blocks of pieces could treated as a single set and a single byte-range |
|---|
| 308 | request to the HTTP server. This would reduce the number of HTTP connections, |
|---|
| 309 | and might work well for a client. |
|---|
| 310 | |
|---|
| 311 | Pieces could be treated as blocks of 10, 50, or 100. If I had done this way, |
|---|
| 312 | I might have chosen "Pieces Per Block" to be MaxPieces/20. So requesting |
|---|
| 313 | about 5% of the file at a time. |
|---|
| 314 | |
|---|
| 315 | |
|---|
| 316 | A Not Recommended Client Implementation |
|---|
| 317 | ======================================= |
|---|
| 318 | |
|---|
| 319 | A client could simply use HTTP byte-ranges to request individual pieces. |
|---|
| 320 | |
|---|
| 321 | Some server administrators would not like this, because there will be 100s |
|---|
| 322 | or 1000s of requests in their logs for a single file. Some may even think |
|---|
| 323 | it is a denial of service attack on them. |
|---|
| 324 | |
|---|
| 325 | |
|---|
| 326 | More On Other Protocols |
|---|
| 327 | ======================= |
|---|
| 328 | |
|---|
| 329 | HTTP and FTP are listed in this BEP as the seeding protocols, |
|---|
| 330 | but a client could use **any** other protocol that allows downloading |
|---|
| 331 | data. HTTPS, FTPS, or SFTP are obvious, but not likely supported |
|---|
| 332 | by many clients (GetRight can do HTTPS and FTPS, but not SFTP). |
|---|
| 333 | |
|---|
| 334 | I am sure RTSP and MMS are possible as well. Potentially even |
|---|
| 335 | Usenet's NNTP protocol could be used. |
|---|
| 336 | |
|---|
| 337 | Other protocols may have additional issues, such as not allowing |
|---|
| 338 | a download to start anywhere other than the beginning of the file. |
|---|
| 339 | |
|---|
| 340 | A client may choose to implement only one of the HTTP or FTP protocols |
|---|
| 341 | but not both. |
|---|
| 342 | |
|---|
| 343 | |
|---|
| 344 | Acknowledgements |
|---|
| 345 | ================ |
|---|
| 346 | |
|---|
| 347 | Thanks to Arvid Norberg (libtorrent.sf.net author) for helping clarify |
|---|
| 348 | the Multi-File torrent parts. And of course to Bram Cohen for creating |
|---|
| 349 | this whole BitTorrent protocol. |
|---|
| 350 | |
|---|
| 351 | |
|---|
| 352 | References |
|---|
| 353 | ========== |
|---|
| 354 | |
|---|
| 355 | .. Original WebSeeding document by Michael Burford |
|---|
| 356 | (http://www.getright.com/seedtorrent.html) |
|---|
| 357 | |
|---|
| 358 | |
|---|
| 359 | Copyright |
|---|
| 360 | ========= |
|---|
| 361 | |
|---|
| 362 | This document has been placed in the public domain. |
|---|
| 363 | |
|---|
| 364 | |
|---|
| 365 | |
|---|
| 366 | .. |
|---|
| 367 | Local Variables: |
|---|
| 368 | mode: indented-text |
|---|
| 369 | indent-tabs-mode: nil |
|---|
| 370 | sentence-end-double-space: t |
|---|
| 371 | fill-column: 70 |
|---|
| 372 | coding: utf-8 |
|---|
| 373 | End: |
|---|
| 374 | |
|---|