IPFS eight layer sub protocol stacks

IPFS is the abbreviation of point-to-point protocol interplanetary file system. It is a global, point-to-point distributed file system, trying to connect all computing devices with the same file system. IPFS is actually a large project, not a thing as you imagine. IPFS is composed of many modules. Each module has now become a project independently and has its own home page. Let's take a brief look at the members of the IPFS family.

1.1 The origin of IPFS

The Chinese name of IPFS is interstellar file system, which was initiated by Juan benet in May 2014. He graduated from Stanford University. Before the creation of IPFS project, his first company was acquired by Yahoo. In 2015, the IPFS created by him received a huge investment in the Y combinator incubation competition, and set up a protocol laboratory at the same time. The lab team consists of 14 core developers and hundreds of code contributors in the community.

1.2 IPFS concept

IPFS is essentially a kind of content addressable, versioned, point-to-point hypermedia distributed storage and transmission protocol. The goal is to supplement or even replace the hypertext media transmission protocol (HTTP) used in the past 20 years, hoping to build a faster, safer and freer Internet era. We use app to brush our friends' circle and microblog on the Internet every day. It uses HTTP protocol, which is based on the computer application level of TCP / IP. It transfers hypertext data from the server to the local browser, which is rendered by the local browser or app and presented to the user. Based on this kind of network environment, CS or BS architecture is formed, and finally it is injected into a large network provider such as bat.

IPFS has fundamentally changed the way of searching, which is its most important feature. Using HTTP we look for location, while using IPFS we look for content. For example, there is a resource website running on my GitHub server【 https://yitaicloud.com 】According to the HTTP protocol, the browser will first find the location (IP address) of the server, and then ask the server for the path of the file. The location of files in this system depends on the server manager, and users can only hope that the files are not moved and the server is not shut down.

IPFS does not care about the location of the central server, the name and path of the file, but only about the content that may appear in the file. Put the resource file in the IPFS node, it will get a new name qmxgtagwtt1utfsb2sbavarmevlk4rqecqg5bv7wwdzwu, which is an encrypted hash value calculated from the file content. The hash value directly reflects the content of the file. Even if only one bit is modified, the hash value will be completely different.

IPFS is a general purpose infrastructure with no storage restrictions. Large files are divided into small blocks, which can be obtained from multiple servers at the same time when downloading. IPFS network is not fixed, fine-grained, distributed network, which can well adapt to the requirements of content distribution network (CDM). This design can share all kinds of data well, including image, video stream, distributed database, whole operating system, module chain, 8-inch floppy disk backup, and the most important static website.

IPFS files can also be abstracted into a special IPFS directory to mark a readable file name (transparently mapped to the IPFS hash) and obtain a directory index like HTTP when accessing. The process of building a website on IPFS is the same as in the past, and only one instruction is needed to add the website to the IPFS node:

ipfs add -r yoursitedirectory

The connection between web pages no longer needs to be maintained by people, and IPFS's own search can be solved.

1.3 IPFS architecture

IPFS has at least eight layers of sub protocol stacks, from top to bottom for identity, network, routing, switching, object, file, naming, application. Each protocol stack performs its own duties and matches with each other.



Identity and Routing, the generation of peer-to-peer identity information and routing rules are formulated through kademlia protocol generation (hereinafter referred to as kad protocol). The essence of kad protocol is to build a distributed loose hash table, referred to as DHT. Everyone who joins the DHT network must generate their own identity information, and then they can be responsible for storing the identity information in this network Resource information and contact information for other members. According to my own understanding, this is a token to access the network. With this token, you can determine your ID in the DHT network.

Network, the core of this comparison, the libp2p used can support any transport layer protocol. NAT technology can make the devices in the internal network share the same IP address of the external network, which is the principle of the home router we have all experienced.

Exchange, the switch layer is a BT tool like Xunlei. Xunlei actually simulates the P2P network and creates a central server. When the server registers the user's request for resources, the user who requests the same resources will form a small cluster swarm to share data here. There are disadvantages in this way, because the server is maintained by thunderbolt. If there is any failure or downtime, the download operation cannot be carried out.

Objects and Files, Objects layer and Files layer manage 80% of the data structure on IPFS. Most of the data objects exist in the structure of merkledag, which provides convenience for content addressing and de duplication. The file layer is a new data structure, which is parallel to DAG and uses the same data structure as git to support version snapshot.

IPFS, The naming layer has the feature of self verification (when other users obtain the object, fingerprint public key is used for signature verification, that is, to verify whether the public key used matches nodeid, which verifies the authenticity of the user published object, and at the same time obtains variable state). IPNs is added to make the encrypted DAG object name definable and readable.

Application, Finally, the application layer. The core value of IPFS lies in the applications running on it. We can use its CDN like function to get the desired data under low cost bandwidth, so as to improve the efficiency of the entire application.