Every YouTube video has a unique ID.
It’s up in the URL: a string of eleven characters
that uniquely identifies which video you want.
Now, YouTube has millions and millions of
videos.
The last stats that they released said they
have
400 hours of video being uploaded every minute.
So: are they ever going to run out of those
IDs?
Well, to find out, let’s talk about counting
systems.
People count in Base 10. 0 to 9.
That’ll be, hopefully, familiar to you.
Computers count in base 2, in binary,
but that’s difficult for humans to read,
it gets too long to write really, really quickly,
so often computers will display it in base
You have 0 to 9, and then A to F,
and then you start adding to the next column.
Humans can’t understand that easily,
but it’s efficient if we have to type it in
somewhere,
and 16 – 2 to the power of 4 – is also easy
for computers to deal with.
That’d be a ridiculous counting system, right?
Except.
64 is another one of those easy numbers for
computers,
it is 2 to the power of 6.
And humans can get to 64 very easily:
0 to 9, then capital letters A to Z,
then small letters a to z, and two other characters.
Most Base 64 uses slash and plus,
but they don’t work so well in URLs,
so YouTube uses hyphen and underscore.
That YouTube URL, that unique ID,
is really just a random number in base 64.
They could have have picked base 10 or base
16,
but they didn’t: they went with 64,
because it will let you cram a huge number
into a small space
and still make it vaguely human readable.
Author and programmer Sam Hughes, by the way,
pushed this to the limit, and invented Base
65,536,
which includes basically every character from
every language.
It is ridiculous and unnecessary,
but when has that ever stopped programmers?
So why didn’t YouTube just start counting
at 1 and work up?
Well, first, they would have to synchronise
their counting
between all the servers handling the video
or they’d have to assign each server a block
of numbers.
Either way, there’s a lot of tracking to do,
a lot of making sure that it’s never duplicated.
Instead, they just generate a random number
for each video,
see if it’s already taken, and if not, use
it.
And secondly, it is a really, really bad idea
to just count 1, 2, 3 and so on in URLs.
Incremental counters, as they’re called, can
be a big security flaw:
if you see video 283 up there, then you might
wonder:
what’s video 284? Or video 282?
It’s easy to enumerate, as it’s called,
to run through the entire list.
YouTube Unlisted videos, the ones that don’t
appear publicly
but that you can send the link to people,
those wouldn’t work.
And by the way? Lots of badly designed sites
do use incremental counters.
And it is a terrible idea.
It might tell your competitors exactly how
many customers you have,
‘cos they can just count them.
easily,
‘cos they can just run through them.
And in one site that someone in Florida emailed
it lets you look at other people’s personal
details.
Don’t use incremental counters if you’re building
a web site. Use a random number.
Which brings me to the question:
just how big are the numbers that YouTube
uses?
Well, let’s work it out.
One character of base 64 lets you have 64
ID numbers.
Two characters? That’s 64 by 64, or 4,096.
Three characters? 64 times 64 times 64 — or
64 to the power of 3.
That is already more than a quarter of a million.
And if we go to four? Well, now we’re above
16 million.
If you use Base 64, then you can assign an
ID number
to everyone who lives in London down there
twice over,
and you’ll only need four characters.
This gets big fast. We can keep on doing this,
and by seven characters we’re already at four
Now, I assume that YouTube checks through
a dictionary,
and doesn’t allow any actual words to appear
up there —
particularly anything rude.
But that is going to be a tiny minority of
the URLs,
so for our purposes, we can pretty much just
ignore that.
At YouTube’s 11 characters, we are at 73 quintillion