Academic Torrents is a platform for researchers to share data. It consists of two pieces: a site where users can search for datasets, and a BitTorrent backbone which makes sharing data scalable and fast. The goal is to facilitate the sharing of datasets amongst researchers. It was created by the Institute for Reproducible Research (a U.S. 501(c)3 non-profit).
The site provides access to over 20TB of data including popular machine learning datasets such as all of UCI, Imagenet, and Wikipedia. Though some of these datasets are available elsewhere, Academic Torrents stitches multiple hosting locations together so downloading is much faster and also fault-tolerant. For downloaders there are no sign-up or verification processes in the way, and the collection is more comprehensive than anywhere else. Many datasets such as Netflix, where the original hosting location is no longer avaliable, are made available using Academic Torrents.
As data gets bigger, peer-to-peer file transfer becomes increasingly attractive, since it is the only way distribution scales with the number of users. Academic Torrents currently facilitates the transfer of over 900 GB/day and over 30000 users/monthly.
The guiding principle of Academic Torrents is to ensure that the data the community needs is always available and can be obtained quickly. In order to ensure that data is always available it needs to be stored in more than one location in case the initial location is not available. Typically, when a user downloads data from a secondary website it is unclear if they they found the correct data. BitTorrent allows data to be mirrored transparently in a peer to peer fashion while maintaining the correctness and authenticity of the data. A speed increase is gained because a user can download from all the mirrors at once.
- Lo, Henry Z. and Cohen, Joseph P., (2016). Academic Torrents: Scalable Data Distribution. Neural Information Processing Systems 2015 Challenges in Machine Learning (CiML) Workshop. http://arxiv.org/abs/1603.04395
- Cohen, Joseph P. and Lo, Henry Z., (2014). Academic Torrents: A Community-Maintained Distributed Repository (p. 2:1–2:2). New York, NY, USA: ACM. http://doi.org/10.1145/2616498.2616528
- KD Nuggets – What is Academic Torrents and Where is Data Sharing Going?
- Hacker News Post 2016 – Academic Torrents: A distributed system for sharing enormous datasets
- Research Computing and Engineering Podcast Interview
- The Learning Library – Game Changer: Academic Torrents?
- Library Journal – Academic Torrents Offers New Means of Storing, Distributing Scholarly Content
- MyScienceWork – Academic Torrents: Bringing P2P Technology to the Academic World
- Torrent Freak – Academics Launch Torrent Site to Share Papers and Datasets
- Hacker News Post 2014 – Academic Torrents