PAN Hashing… and the misconception that collisions cannot occur
Several years ago when PCI was first being discussed many people within the payments industry were arguing the merits of using Cryptographic Hashes compared to traditional Symmetric Key Encryption.
The most obvious of these benefits seemed to be that the use of Cryptographic Hash algorithms fundamentally addressed the age-old challenge of implementing a robust key management policy, furthermore as these Cryptographic Hash algorithms are One-Way functions they have the added advantage of being totally secure… or do they?
What follows is a simple explanation as to why hashes can collide, and why our inherent need for ‘quick answers’ has tricked us into believing otherwise.
Other topics such as why using hashing on a relatively small data set such as PANs opens the door to brute force attacks, why a ‘salt’ should be protected in a similar fashion to a ‘symmetric key’, will be saved for a another day.
The difference between ‘Intentional’ and ‘Accidental’ Collisions
The web is full of research findings detailing just how difficult it would be to ‘intentionally’ create a hash collision for a modern Cryptographic Hash algorithm such as SHA-256.
A casual Google search using a phrase such as ‘SHA2 collision probability’ will return a multitude of results all explaining how difficult it is to ‘intentionally’ manufacture a collision but few if any that actually explain why collisions can occur by ‘accident’.
This is clearly reassuring as one of the main requirements of any Cryptographic Hash Algorithm is that it must be ‘infeasible for a potential attacker to find two messages with the same hash’.
But this is where our inherent need for ‘quick answers’ gets the better of us. We do a quick search, glance at the results, seeing what we expect or want to see, without taking the time to properly understand and digest the information that is presented to us.
Remember, the focus here is not on the ability of an attacker to ‘intentionally’ find a collision, but moreover the possibility of a collision ‘accidentally’ occurring.
If we were using a Hash Algorithm for the traditional purpose of creating a hash of a document then the security would revolve around the infeasibility of an attacker being able to engineer an ‘intentional’ collision, as we’re using the hash to prevent any tampering of the document contents.
However, in the context of ‘PAN Hashing’ the underlying intention is to render the PAN unreadable without incurring the overheads associated with symmetric key management. As such, the occurrence of collisions would prevent us from being able to prove with absolute certainty that a given ‘hash’ was the result of a given ‘PAN’, something that we would always be able to prove if we encrypted the PAN using a Symmetric algorithm (i.e. AES).
Why One-Way algorithms are designed to Collide
The real irony when discussing hash collisions lies in what is perceived to be one of the biggest strengths of Cryptographic Hash algorithms, the fact that they are One-Way irreversible algorithms.
Did you ever take the time to contemplate how these algorithms ensure that they are One-Way?
We don’t need to delve into any advanced mathematics here, as a simple example will suffice. Let’s consider the following One-Way function:
C = (A + B) MOD 10
The function is One-Way / Irreversible because simply by knowing A and C it is not possible to deduce B, and likewise by knowing B and C it is not possible to deduce A, as illustrated by the following examples:
A=5, C=9 –> B=4,14,24,34, …
B=2, C=1 –> A=9,19,29,39, …
The reason that the function is One-Way / Irreversible is quite simple… it produces ‘collisions’!
If you take a look at standard Cryptographic Hash Algorithms such as the SHA family of hashes, you’ll see the same technique (i.e. Addition modulo 2n) being used to ensure that the algorithm is One-Way / Irreversible.
Conclusion
Hopefully by this point you’re convinced that Cryptographic Hash Algorithms such as the SHA family do collide and that this is an intended and fundamental part of their design.
Whilst the likelihood of two or more valid PANs colliding and resulting in the same hash is probably extremely low, the point remains that it could theoretically happen.
Before using a Cryptographic Hash algorithm as a means of providing ‘pseudo encryption’ of PANs, it is important to first understand the fundamental properties of the algorithm and what it was originally intended for.