Understanding IPFS

What is IPFS?

IPFS (Inter-Planetary File System) is a peer-to-peer network for storing content on a distributed file system.

Simply put, IPFS is a p2p storage network, where content is made accessible by peers (nodes) who can store and relay that data from anywhere in the world. IPFS can be broken down into its content addressing, content linking and content discovery systems that make up the protocol.

Content Addressing

IPFS uses content-based addressing to identify content by what's inside it rather than where it is. So instead of a typical address like https://en.wikipedia.org/wiki/Aardvark, you get something like /ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Aardvark.html

The awesome part is that IPFS can do this for any type of content, including databases. IPFS uses a content-identifier; a unique sha-256 hash generated from the underlying content. Content Identifiers always stay short regardless of what the content is.

Content Linking

IPFS uses Merkle DAGs (directed acyclic graphs) as its data structure. Merkle DAGs are graphs where each node is uniquely identified by a hash generated by the content of the node. Sound familiar? Git, as well as many other distributed systems use the same data structure.

These hashes do more than identify the content, they're also used to link them together. IPLD (Inter-Planetary Linked Data) translates all types of hash-linked data structures, providing a gateway to various content-addressable data structures. In other words, IPLD can resolve content-identifiers from different types of distributed systems (e.g Git links, IPFS links, Ethereum links etc). So far, IPFS generates content-identifiers and links that content together by generating IPLD Merkle DAGs.

Content Discovery

IPFS uses DHT (distributed hash tables) powered by libp2p, to discover which peers/nodes are hosting the content a person is look for. A distributed hash table is a key:value pair database, where the table is shared across all the peers in the network. To find a file, you need to connect to the network, ask the peers which peers have the file you're looking for, find the current location of those peers, connect and retrieve the files. This is currently handled by an IPFS module called Bitswap. Bitswap allows you establish connections and exchange blocks of data with the peers that have the content you want. After retrieving the blocks, you can verify them by hashing their content to get their content-identifiers and compare them to the content-identifiers you requested.

Getting Started with IPFS

Any type of IPFS gateway will resolve any requested IPFS content-identifier, so choosing which gateway to use should be a factor of proximity to get the best performance possible.

Local Gateway

The simplest way to get an IPFS gateway up and running is to host one on your computer @localhost:8080. You can do this locally via IPFS Desktop or on the browser via IPFS Companion or via the command-line.

Public Gateway

You can access the public gateway deployed by Protocol Labs or other third-party gateway providers, which you can find here. There are different types of public gateways that can be categorized based on their features and limitations:

Read/Write Support

Authentication Support

Resolution Style

Service

So keep these in mind when choosing which public gateway to use. You can read more on this here.

Why IPFS is Important

Now that we understand the fundamentals of IPFS, why is it important and what are the use cases? The structure of the web today (web 2.0) is based on ownership and access. Files are stored on servers owned by other people, who grant access to us. That's changing in next version of the web (web 3.0); IPFS is changing how networks (of computers & of people) store, access and share information. The protocol is being used as the decentralized storage layer of decentralized apps, a paradigm shift from the way the web works today.