SHA stands for the secure hash algorithm. Which is interesting given that it has just kind of been broken
But I’m not going to talk specifically about the attack on SHA today
that’s for a different video, but
What I wanted to do was talk in a little bit more detail about hash functions and what they are
And how SHA in particular works so you can get an idea of a kind of way these things are implemented
let’s have a quick look at what a hash function is briefly because Tom’s got somebody covered it in this video and
He’s gone into a lot of the reasons you might use a hash function. The kind of hash functions
I’m talking about are not the ones that we’ve been talking about for
Hashing passwords. Those ones have to be quite slow in some sense because you need them to be secure
We’re going to talk mostly about the hash functions that are used routinely in cryptography for things like message authentication
Digital signatures, and so on. So they need to be fairly quick, both to verify and compute. A hash function
Take some string, right, let’s say “abc” and it turns it into some fixed length string
(that’s not usually three long) of random
So we know a bit string right, but so so you, 011001011…
Go forward this way for however many bits that hash function is. Now, there’s a few important properties of Hash functions
That we care about for cryptography, but the most important one perhaps. Is that it’s essentially pseudo-random
So that means that we put in “abc”
And we get out something that is in no way like “abc” and appears completely random to us and if we change this even slightly
Because it’s appearing random this has completely changed
So let’s have a quick look at SHA-1 as an example just so we can see this in action
I’m on some page that has a script that calculates hashes on the fly so I can put in “abc” and
You can see that the hash is a9993e and so on all the way up to d, right?
This is the SHA1 hash. A SHA1 hash is 160 bits long. If I change this
C to a D the hash is completely changed. So there’s the appearance of randomness – the idea that
This hash is actually not related to this at all
Even though it is and we know it is because if I put C back again
We’re back to a9993
So we can use this to verify messages haven’t been changed or verify signatures on certificates
And we can do it knowing that we have the appearance of randomness , but actually it’s not random at all. Today
We’re going to talk a bit about how you actually write a hash function to do this
How do we take something that essentially isn’t random with a very known structure and turn it into something that looks like nonsense
Such that we can use it. Now,
There’ll be people raising a few eyebrows that
I’m using SHA1 as an example to do this
But actually there’s fairly reasonable reason to do so.
First of all you know we might also talk about the weaknesses at some point
but also SHA-1 bears a striking similarity in structure to MD4 and MD5 which is see a lot of use historically and
SHA-256 and SHA-512 which is a SHA2
Which currently is in some sense a standard that everyone uses right SHA3 is quite different
And that’s is something else for another day
So SHA1 was developed by the NSA
And released and published in 1995 now a lot of people don’t trust things that the NSA do sort of by default
Which might be fair, but in this case actually SHA1 was quite good for a long long time
when there were some concerns … recently much more serious concerns, but
Originally the NSA weren’t doing it as a back door and stuff the NSA need cryptography
Just like everyone else and this is a good function
MD5 had a lot of problems and so what they basically did was extend it and make it better
SHA1 takes any length of string and outputs a
160 bit value. Alright, so that’s
160 zeros and ones. The question then becomes: I’ve got a string but could be “abc” or it could be an incredibly long file or
You know a whole movie right, and I want to calculate 160-bit signature of that
How do I even get started doing that well the answer is that I basically have a loop that takes
512 bit blocks of data one at a time until the file’s expended. Let’s look at an example of how SHA1 works on just
a single message of exactly the right length
And then we’ll just talk briefly about what you do when inevitably isn’t the right length
Which is almost always, right? So SHA1 takes a message of n blocks of 512 bits in length, and
If it’s only one block – if the message is exactly 512 bits in length, then we only run it once, in essence
And then we out put the hash at the end so SHA-1
Starts with an internal state then we bring in bits of our message one at a time
And we change that internal state and after we’ve done that at the very end when there’s no more message left
We just read what the internal state is and that’s our hash alright
so we’re going to basically be taking the internal state and updating it with the message until
We run out of message, and then as soon as that process stops we can read the results
That’s how the SHA family of hash functions works, so our internal state we call H so I’m going to say H0
H1 H2 H3
and H4
Now the internal state of SHA1 is exactly the same length as the hash that it produces. Which is 160 bits. Which is five
32-bit words four bytes each you know for 32-bit machine this would be an int
So this is initialized based on some constants. Which is defined in the standard
We might talk about that in a bit
but it’s not very important and what we’re going to do is we’re update these h’s as
We bring in our message and then at the end. We’ll see what the Hs are and that’s our hash function
So how do we do this? Well? We have something [called] a compression function?
It’s going to take in this data and a bit of message
Turn it into another set of h values and that’s going to repeat as we have message
But that’s only going to happen once this time because my message is exactly 5 12, which is very handy
So this is our compression function
And I’m going to rename these slightly just to confuse everyone to ABc dearly so at the beginning of our compression function
We copy B’s the internal state into a b c d and e
We then perform 80 rounds of char compression function, right?
Which is like this so x 80 now what that’s going to do is take in words from our
512 bit block of our message, so if this is our message here that message is
512 bits this is going to come in at this point and be mixed in with this ABc and D so well for now
We won’t talk about exactly what’s going on in this compression function
But the idea is that the bits of abcde are being combined together? They’re being shuffled
They’re being commuted around to make it look more and more random as we go and at the same time
We’re bringing in bits from this message to further increase the appearance of mandamus
But also to make sure that this shower function is calculating a digest on this specific message rather than just a general one
That’s the same every time for this message. We’re always going to perform the exact same algorithm
So if we put in this message a second time the shower function will produce exactly the same result
Now once we’ve done this and we shuffled up abcde will be left with a new
Abcde so a b c d and e
And then we finish this block by bringing our h values down here and adding [them] to these values here
to create a new
H naught
H1 H2 H3
H4 the State is now been updated by whatever we did in this compression function by just adding to it all right now all addition
In Char is done without any overflow modulo 2 to the 32 well that means is that if you ever go over?
The maximum value allowed by a 4-byte integer you lap back around again
Right which is one of the reasons why shark arm in reverse because you might lose information that way
This is not encryption. We don’t need to be able to reverse it, so this state is
Finished now if our message is exactly 512 bits
We need these 18 or h1 h2 h3 h4 values out that is our hash, so for short messages. We’re done
I could just you know go home
in actual fact the the principle of extending this to arbitrary length messages right in increments of
512 Bits is
We copy this state back up to the top and we repeat the process and then we copy back up and we repeat the process
For as many blocks as we have of our message 512 bits at a time of our message
We feed it in we alter the state using this approach here, and then we lead off a state when we’re done, right?
That’s basically how it works
So be the security of SHA is all in this compression function and what it’s doing. If it’s shorter than that, what
happens there? If it’s not a multiple of 512 bits. We’re going to have to do some padding right?
Char only works with 512 bit blocks, so what we do is if we have our message. Which is let’s say
1 0 1 1 0 1 it’s not a very long message. If we want to pad that up to 512 bits
We start with [a] one. We pad with 11 [alright], so I’m going to sort of mark, de-mark the padding here
so we know we go with
0000 and then we finish off the message with the length of the actual message in it
So we know where to sort of remove the padding which in this case is 7 so in Binary 1 1 1 so 1 11?
Obviously would a lot more bits for your length than I have done. You get the idea now this padding scheme
Ensures that messages of the same length and messages that end in the same way or in very similar ways don’t share the same padding
And don’t end up being the same that’s very important, so this approach to SHA [its] repetitive
Updating of the internal state with a compression function in essence is called a merkel down guard construction
This was sort of independently proved by murkland damn. [God]
But essentially what’s good about it is if your compression function is good and has good pSeudo-random properties then so does your shower function
Which is of course very useful right?
The problem is that the compression function of char one is not so good that the attacks are
To the 18 they can be reduced somewhat to about 2 to the 60 something like this which it becomes
Into the realm of possibility for people with a lot of money