Page Contents :
Hash Function VS Spreadsheet Function
Hash function properly is the most important fundamental knowledge to understand how the blockchain really works.
There are many types of hash functions, such as MD, SHA, and so forth. However, in this post, we will focus on the cryptographic hash function.
First, let’s take a glance at what is a function? We all know what a spreadsheet is, because we use the simple functions within the spreadsheet all the time, and even edit the functions of the spreadsheet.
An input Like =sum(5,7). Here, sum() is a simple input function within the spreadsheet. It takes a list of input values- 5,7 and adds the numbers together.
In short, the function of this function is to find the sum of the values. The principle of the hash function is the same as the principle of the spreadsheet function, but its output is not a simple numerical value like the sum.
The hash function outputs a so-called “hash” value. As shown below, a long string of arbitrary characters:
Properties Of Hash Function
A hash value is like a bunch of randomly generated text. The hash function can input digital data of any length or type, including numbers, text, images, videos, and e-books. It is converted into a long string of seemingly random but fixed-length output strings through hashing.
The conversion of the hash will lose some information, which is why we cannot reverse decrypt the hash value. For instance, we hash a gigabyte file and output a 64-character hash value. Obviously, this lost tons of information during this hash process. How can we “reverse decrypt” 64 characters into a gigabyte file?
At now, you may realize the advantages of using hash functions. We can use a short string of characters to verify the integrity of a gigabyte file. So this hash value, sometimes we can also call it a digest.
Having said that, how important is that the hash function? What is it for?
We already know that the hash function generates a fixed-length string, so the cryptographic hash function can be used for password verification.
For example, when you log in to a website, after you enter the password, the backend process will match the password to verify whether the password you entered matches with the password stored for your account.
If our password is just stored directly in a text file on the server. Then when the server is hacked, the intruder can access all password files and have all the user’s passwords for the entire system.
In fact, most websites use hash function and store your encrypted hash value in the server’s database. For example, my password “R@ck3ton3“, its MD5 generated hash value is:
The website stores the hash value of my password in its database rather than my actual password.
To put it in a different way, the backend process uses the same cryptographic hash function as when the password was originally created to perform the hash conversion, then compares the hash values โโof the two passwords.
You might be thinking if the server is hacked, can’t the intruder use the hashed password to reverse engineer the hash function to recover my password?
The simple answer is no!
Just like the example of the sum function in the spreadsheet we mentioned earlier, if we only know that its final result is 12 , we wouldn’t know what the original combination is! It might be 7+5, 2+10, or -23+35 , and so on.
What’s more, the maths of the cryptographic hash function is more complicated. Coupled with the output characteristics of the hash function, its application is more irreversible and safer!
Since the hacker cannot obtain the password file, their only way is to guess your password. Usually, they will carry out a so-called dictionary attack, that is, hash on some common words, phrases, or variant strings, and try to find matching hash values.
Assuming my password is “stone“, then the hacker will quickly find my password. This is why when we create a password these days, the website requires that the password input must be a mixture of uppercase and lowercase letters, numbers, and symbols!
If I change the password to “St0n3“, hackers will hardly think to hash something like “St0n3“. In this way, my password will not be easy to hack, and it will be more secure.
Like my password “St0n3“, the hash value is:
If a hacker guesses that my password is “st0n3” and hashes it, the result is:
The hash values โโof the two are completely different!
Obviously, even if the input is very similar or close to each other, the output hash value is going to be completely different. Therefore, the hash function is not only irreversible but also due to changing a letter or symbol, the output hash value also will become different.
Digital Fingerprint
Two different passwords get the same hash value after being hashed, and the result is called collision.
There are different types of cryptographic hash functions, and therefore the length of their output results also are different. For example, the common SHA-128 hash function, its output result is 128 bits length. The output length of SHA-256 is 256 bits length.
Bits are binary 0s and 1s. After hashing, a 16-digit binary hash value is generated. Among 65,537 users, hash collision will inevitably occur. However, if the length of the hash value is extended to 32 digits, the probability of collision will drop to one in 4,294,967,296.
So, hash collision is minimal and almost impossible!
The calculation speed of the hash function is extremely fast, and at the same time, it does not consume too much computing resources.
Based on the properties of the above hash function, for a fixed input, we can regard its hash value as a unique “fingerprint”.
For example, imagine you prepared a contract for a client, then email it to the client for review and send it back to you.
How do you confirm that the customer has not made any changes? Of course, you could carefully read the returned contract line by line to make sure that it is matched with the original contract without any changes.
But this is often not the best way! You could hash the original contract and compare this hash value with the hash value of the received contract.
Because the hash value output by the hash function is the only “fingerprint” of its input data. If the contract is modified, its hash value will not match; on the contrary, its hash value will definitely match!
Hash Function Is Deterministic
The hash function can convert or compress an input of any length into a hash value of a specific length. For certain functions like SHA-256, the same input will always yield the same hash value. This is another crucial feature of the hash function, that is, its output is deterministic.
The output of the hash function is a long string of binary numbers. For example, “1234” is expressed in binary, which is easy to be dazzled and inconvenient to use:
If we use hexadecimal display, although it contains letters, the reading becomes simpler and clearer:
Based on the above, the hash function attributes :
- Can accept any length or type of digital input
- Deterministic results, the same input always yield the same output
- The output value is irreversible
- The output value can be a small, fixed-length, unique digital fingerprint
- Even if the input data is slightly changed, the output value will change significantly
- Function calculation speed is fast, does not consume too much computing resources
Now we should have a sufficient understanding of the hash function, and then we will continue to understand how it is used in the block.
Very good post. I absolutely appreciate this website. Stick with it!