There has always been a dark side to the Internet, and from the earliest days piracy was rampant. It began with message boards even before the traditional “internet” as we know it was even born, progressing to warez sites and private FTPs hosted on home computers. Finding pirate software and files used to be a slow and arduous task; it was more common to get the music or software from a friend as a physical copy (a so-called the “sneakernet”). P2P filesharing technology changed all that – but what does “peer to peer” even mean, and where did it all start?
Of course, peer to peer file sharing technology isn’t only used for piracy, but let’s be honest here: that’s the predominant usage, and that’s certainly where its roots began. There’s no need to flame off in the comments saying there are legitimate uses for it too; we know, but we’re not going to dress up the truth. Today we’ll be talking mostly about the filesharing aspect, but this certainly isn’t the only use case.
It’ll help to give some context on what “peer to peer” isn’t, first. The internet is traditionally what’s known as a client-server environment. Web services sit on a powerful server somewhere remote, and your computer, the client, requests information from it.
A single server can host files for hundreds of simultaneous clients, but scaling is difficult for a number of reasons.
Firstly, there’s the physical hardware requirements. This isn’t such an issue when you’re just hosting files, but if computation is required – such as when you’re hosting a dynamic website like MakeUseOf – then the CPU must work to customize those pages to individual users. Massive amounts of memory are needed, and these have a physical limit ultimately meaning more servers must be brought in to cope with demand.
Secondly, each client takes up a small slice of the connection; as a theoretical example, if the server has a 100mb/second connection, then 100 simultaneous users will only get 1mb/second maximum, each. Scale that to a 1000 users and the speed drops ten fold to 100k/second. So the more users you have, the less speed each of them are able to utilize.
Data transfer is another concern. A single 1mb file requested by 1000 people will mean you’ve used a 1gb of data transfer. When you pay per gigabyte, that can really add up; bear in mind that a single HD movie can often be around 4gb in size. Pushing huge files out to thousands of users is an expensive business.
So what is peer to peer?
Peer to peer is a different model, in which everyone becomes a server. The server role is distributed to users; instead of simply taking files, peer to peer makes it a two way street – you could now give back. In fact, giving back (known as “seeding” nowadays) is critical to the success of peer to peer networks (which is exactly why downloading without seeding – or leeching – is looked down upon as a cardinal sin in filesharing circles).
Unlike the client-server model in which performance degrades with more users; the peer to peer model actually works more effectively with more users in the network. The more users that make a particular file available from their hard drives, the easier it is for new users to acquire that file.
In some p2p networks, it’s faster once a certain threshold is reached; instead of taking the whole file from one user, you’re taking a smaller piece of the total file from hundreds of other users – combining connections to use your own at maximum efficiency. It should be noted that not all p2p software work this way though: BitTorrent was one of the first to aggregate connections in order to speed up the download by taking just a small part of the file from many different places simlutaneously.
Initially with p2p networks, some form of central server was needed to organize the network – to act act a database that holds information on currently connected users, files available in the system etc. Though the heavy lifting of file transfers was done directly between users, the networks were vulnerable since knocking out that central server meant disabling communications completely. This is no longer the case with recent developments; you can ask peers directly if they’ve seen a particular file. There is no way to knock out these networks; they are effectively indestructible.
Now you have an idea of why Peer to Peer networks were such a revolution compared to the Client/Server model, let’s take a quick look at the historical context.
Napster was the first widely available implementation of a peer to peer model in 1999. A central database held information about all the music files held by members; when you searched for a song to download, you would actually be connecting to another online user and downloading from them. In turn, once you had that song in your Naptster library, it would be available as a “source” for others on the network. You could also just add your own files, which would then be indexed and added to the database, ready to propogate across the world. The implementation was limited in that you could only download from one person though – though there was a high availability of songs, speeds were not so great.
Napster was eventually shut down in 2001, but not before similar networks had arisen that offered more than just music: movies, software, and images would now also be made available on Morpheus, Kazaa, and Gnutella networks (Limewire is probably the most famous Gnutella client).
Over the years, various protocols and peer to peer file sharing software came and went, but one open protocol has really taken hold: BitTorrent.
Designed in 2001, BitTorrent is an open protocol whereby users would create a “descriptor” file (a .torrent file) containing information about the download, but not the actual download. A tracker is needed to store these descriptors, along with who currently holds that file, but it is an open protocol in the sense that anyone can make a client and anyone can host a tracker. Yes, even though it needed a central tracker, multiple trackers could exist, and any single torrent descriptor file could be registered with multiple trackers – meaning the network was incredibly robust. Knocking out one tracker wouldn’t neccessarily make a file unavialable, and another tracker can simply pop up to take it’s place.
Since then, incredible advances have been made that essentially remove the need for a central tracker. DHT – a distrubuted hash table – is one such technology that’s been implemented by BitTorrent, enabling the job of indexing files to be also be distributed amount all the users. Magnet links are another – Tim wrote all about these before, so be sure to read that for an in-depth overview of how magnets differ from traditional .torrent files.
Finding content in the first place is still a work in progress; if you have the hash ID, you can find peers with the files available – but what if you don’t have the hash yet? Clients such as Tribler have attempted to solve this, but it’s mostly client independant and not a part of the core BitTorrent protocol, so there’s certainly still room for improvement in this respect.
I hope this has shed some light on the meaning of peer to peer and where it began. I think it’s fair to say p2p software changed the internet and our lives forever; it is estimated that p2p software is responsible for between 40–70% of all internet traffic. The primary usage remains piracy, but there’s no reason media outlets shouldn’t embrace the protocol. The linux movement advocates using torrents to distribute large ISO images of the various OS flavours, thereby avoiding heavy hosting costs.
Did you get a chance to use Napster back in the day? Or was your first introduction to filesharing through the humble torrent? Tell us – where did your first mp3 come from?