How it works
To envision how Gnutella works, imagine a large circle of users (called nodes), who each have Gnutella client software. The client software on the initial use must bootstrap and find at least one of those other nodes. Different methods have been used for this, including a pre-existing list of possibly working node addresses shipped with the software, using Gwebcache sites on the web to find nodes, as well as using IRC to find nodes. Chances are at least one node (call it B) will work. Once it has connected, node B will send node A its own list of working nodes. Node A will try to connect to the nodes it was shipped with, as well as nodes it receives from other nodes, until it reaches a certain quota, usually user-specifiable. It will only connect to that many nodes, but it keeps the nodes it has not yet tried (it discards ones that it tries but did not work).
Now, when user A wants to do a search, it sends the request to each node it is actively connected to. It is possible that some of them will no longer work, in which case user A tries to connect to the nodes it has saved as backups. The number of actively connected nodes for user A is usually quite small (around 5), so each node then forwards the request to all the nodes it is connected to, and they in turn forward the request, and so on. In theory, the request will eventually find its way to every user on the Gnutella network.
If a search request turns up a result, the node that had the result contacts the searcher (whose IP address was included with the search request) directly. They negotiate the file transfer and the transfer proceeds. If more than one copy of the same file is found, the searcher can perform a "swarm" download - download pieces of the file from different nodes. This results in increased download rates.
Finally, when user A disconnects, the client software saves the list of nodes that it was actively connected to, and that it was keeping as a backup, for use next time it connects.
In practice, searching on the Gnutella network is often slow and unreliable. Each node is a regular computer user; as such, they are constantly connecting and disconnecting, so the network is never completely stable. Since individual users' connections are likely to be slow, it can take a very long time for a search request to traverse the entire network (which averages around 100,000 nodes at any time).
The real benefit of having Gnutella so de-centralized is to make it very difficult to shut the network down. Unlike Napster, where the entire network relied on the central server, Gnutella cannot be shut down by shutting down any one node. As long as there are at least two users, Gnutella will continue to exist.
Protocol features and extensions
Gnutella operates on a query flooding protocol. The outdated Gnutella version 0.4 network protocol employs five different packet types, namely
- ping: discover hosts on network
- pong: reply to ping
- query: search for a file
- query hit: reply to query
- push: download request (for firewalled servents)
These are mainly concerned with searching the Gnutella network. File transfers are handled using