Meandering the esoteric realms of Cryptography [Part-1]: Hashing

Cryptography is the ultimate form of non-violent direct action -Julian Assange.

The escalating quest for privacy and security involves certain abstruse concepts that the average geek encounters incessantly in his mundane lifestyle. Some of the terminology involved with it invokes a sense of ennui. Therefore only a modicum of aficionados understands the underlying postulates. Yes, I'm talking about cryptography. Without using the turgid vocabulary, let's dive straight into the basics.

Remember that hashing and encryption are the twin pillars of cryptography. In this post, we will be exploring cryptographic hashing, and the follow-up post will cover encryption.

So what is Hashing?

Hashing is a one-way function that transforms the input to produce a unique message digest. A good hashing algorithm is irreversible, and therefore it is impossible to retrieve the original input text. Securely storing passwords involves applying the hashing algorithm to them. One might wonder, how does password verification during authentication work then? The answer is simple and comparing the hashed value for two passwords is the appropriate solution. Let's see this with the help of an example:

The cryptographic hash value will be the same only if the passwords are the same. Notice how even a minor change completely changes the hash! That's how powerful good hashing algorithms are. This property is known as the Avalanche effect.

Wait, then does leaked password databases affect me?

If the server does not store plain-text passwords, then stealing the hashed password files is rarely beneficial technically. But there's a catch. Most people rarely use random passwords, and therefore, techniques such as using Rainbow Tables come into play. An attacker can run a collection of a million or so commonly used passwords through a hashing algorithm and get a list of associated message digests for these passwords. It is then easy to compare a file of stolen password hashes against a rainbow table. For every match, the table will show the password for that hash.

Then what's the appropriate way to counter this?

The effective strategy is to salt the hash, which implies adding a random number to each password before the hashing. The resulting message digest is the product of both the password and the salt value and will not match anything on the rainbow table.

Of course, the attacker can always try adding random values to common passwords to find a matching hash, but now the difficulty of guessing the password makes it impractical. The return on investment of such a process is so low that a stolen file of well-hashed and salted passwords is essentially worthless.

So are Cryptographic Hashing Functions different from Normal Hashing Functions?

Every cryptographic hashing algorithm is a hash function, but vice versa is false. A cryptographic hash function or collision-resistant hashing function must satisfy three basic properties:

  • Second preimage resistance: For a random value m1 chosen by an honest party, it's very costly for an attacker to find any value m2 ≠ m1 such that hash(m1) = hash(m2).
  • Preimage resistance: For a random value h chosen by an honest party, it's very costly for an attacker to find any value m such that hash(m) = h.
  • Collision resistance: It's difficult for an attacker to find any pair of values m1 ≠ m2 such that hash(m1) = hash(m2).

The new cryptographic hashing functions satisfy another property along with the previous three:

  • Random oracle indifferentiability: It's very costly for an attacker to find any non-chance correlations between the outputs of inputs of their choice.

One of the methods used for constructing collision-resistant cryptographic hash functions is Merkle–Damgård construction or Merkle–Damgård hash function. It utilizes collision-resistant one-way compression that can't handle inputs of arbitrary size. So a padding function (MD-compliant) creates an input whose size is a multiple of a fixed number. The hash function then breaks the result into blocks of fixed size. These blocks are then processed one at a time with the compression function, each time combining a block of the input with the output of the previous round. The in-depth details of the implementation are intricate and are far beyond the scope of this post.

What are some of the commonly used Cryptographic Hashing Algorithms?

Some of the commonly used cryptographic hashing algorithms include MD5 (No longer secure since collisions against MD5 can be calculated rapidly), SHA-1 (No longer secure since collisions against the algorithm can be produced rapidly), SHA-2 (Classified internal design but utilizes Merkle–Damgård structure), SHA-3 (Utilizes Sponge construction), and BLAKE-2 (Offers high efficiency and used in popular algorithms such as Argon2).

Where are these Cryptographic Hash Functions used?

Cryptographic Hash Functions are used for a wide range of applications such as File and Message Integrity verification, Password verification as mentioned earlier, and even crypto mining for implementing proof-of-work algorithms.

I would like to encourage the readers to explore more about hashing since the topic is extremely vast to encapsulate in a single post. Some of the interesting topics to explore include the famous birthday problem and its implication on cryptography. 

So, what do you think? What's waiting on the horizon for cryptographic hashing? Feel free to share your opinion in the comments below.

- Connect to me on LinkedIn: Apratim Shukla (aka Earthing)