Cryptographic hashes are functions that take some arbitrary input and return a fixed-length value. The particular value depends on the given hash algorithm in use, such as SHA-1 (opens new window) (used by git), SHA-256 (opens new window), or BLAKE2 (opens new window), but a given hash algorithm always returns the same value for a given input. Have a look at Wikipedia's full list of hash functions (opens new window) for more.
As an example, the input:
would be represented by SHA-1 as:
However, the exact same input generates the following output using SHA-256:
Notice that the second hash is longer than the first one. This is because SHA-1 creates a 160-bit hash, while SHA-256 creates a 256-bit hash. The prepended
0x indicates that the following hash is represented as a hexadecimal number.
Hashes can be represented in different bases (
base32, etc.). In fact, IPFS uses that as part of its content identifiers and supports multiple base representations at the same time, using the Multibase (opens new window) protocol.
For example, the SHA-256 hash of "Hello world" from above can be represented as base 32 as:
If you're interested in how cryptographic hashes fit into how IPFS works with files in general, check out this video from IPFS Camp 2019! Core Course: How IPFS Deals With Files (opens new window)
# Important hash characteristics
Cryptographic hashes come with several important characteristics:
- deterministic - the same input message always returns exactly the same output hash
- uncorrelated - a small change in the message should generate a completely different hash
- unique - it's infeasible to generate the same hash from two different messages
- one-way - it's infeasible to guess or calculate the input message from its hash
These features also mean we can use a cryptographic hash to identify any piece of data: the hash is unique to the data we calculated it from and it's not too long so sending it around the network doesn't take up a lot of resource. A hash is a fixed length, so the SHA-256 hash of a one-gigabyte video file is still only 32 bytes.
That's critical for a distributed system like IPFS, where we want to be able to store and retrieve data from many places. A computer running IPFS can ask all the peers it's connected to whether they have a file with a particular hash and, if one of them does, they send back the whole file. Without a short, unique identifier like a cryptographic hash, content addressing wouldn't be possible.