On “Computerphile” we just love
provocative and mysterious titles and
carrying on from the last time we spoke
let’s say this is going to be a chat
about what came to be called the UNCOL
problem. “Universal Computer [Oriented] Language”
I think it stands for.
It was more specific
than just any old computer language. It
was :” … is there a unique intermediate
language which would suit everybody?”
You know, not as high as C even and not quite
down at the absolute binary level but
more like a sort of absolutely Universal
Assembler – a pseudo-assembler. It’s not
really hardware implemented on any
machine but it’s it’s one that we can
all work with and every compiler in the
whole world – all they would have to do is
produce the UNCOL language, if we can
only agree what it is, and then every
system could talk to every other system
at this agreed low level. Well, as you can
imagine, it doesn’t work like that. It
very soon became obvious that, yes, this
business of putting a level in there and
saying: “We’ll all compile to intermediate
code”, is fine but when you start looking
at what facilities it should have what
facilities it shouldn’t have yo’re up
against the fact that computer hardware
designers like to do things their own
way. I mean, numbers of registers? Might be
16, might be fewer that’s no big deal
Some of them have arranged those registers
[as pointers into] in a formal or informal stack.
Others don’t. Should we always assume that we
have stack capabilities? And I think
somebody pointed out to me – I think Ron
Ron not who originally created these
notes, he said “The thing in the end that
kills you is that they all do
input-output differently; there’s almost
no agreement about how you do I/O”.
So, fairly soon the idea of finding one
unique into me
language had to be forgotten about.
But the idea of different intermediate
languages at different levels of
sophistication really did gain traction
in the 1980s. We mentioned that Steve
Bourne had his Z-code as part of his
Algol 68 project, way back in the early 80s.
A little bit later on, I think it
was in the 80s, many of you will
know this one better, James Gosling
developed the language Java, in which
he decided that pointers were dangerous
and should be hidden (but therein lies
another story). But the big thing that
James made a feature of was to say:
I want my Java systems to compile down to
what he called ‘bytecode’. In other words
it was a sort of pseudo-assembler with
really, like, single-byte op codes like A
and Z, and whatever. And, yes, bytecode
became flavour of the month we all go down
to bytecode but then what do you do?
Well, you’ve got choices. You could either
write an interpreter for bytecode which
will be easy to change. It will be a
little slow, a bit big. If you really
care passionately about having the ultimate
super-fast and efficient binary, you can
always compile bytecode; get it
smaller and all that. So you had options.
But the idea was that, yes, you would have
an intermediate code. Even so, it’s not a
one-size-fits-all. There’s still … it was
ideal for what James wanted to do but
its extensibility to be a universal
panacea? Not so. You see, let me give you
another good example of why some people
might want to move the semantic barrier
a bit higher I mean bytecode is fairly
low-level. What about if we move it up so
we’re getting more airy-fairy?
heaven knows might encounter Haskell way
up theresomewhere!
Classic example of course is a
development of C++ and as many of you
know, as its name implies, it goes beyond
C. It adds classes and all sorts of
other features to C. And the idea from
Bjarne Stroustrup,
the inventor, was that to get something
going, in the first instance at least, he
would, of course, do the obvious thing.
His compiler would compile C++ down to
C. And then you could put it just through
an ordinary C compiler, for the back end.
So, you see, his ‘UNCOL’is at a much
higher level of sophistication than
pseudo-assembler type bytecode level. And
you might say: “Oh well, that’s great,
I mean, it’s obviously suits C++ to do that”.
Yes, it did, but there are big problems
with this approach
Once you broadcast the fact that
actually C++ your “Mark I” compiler is
producing C under the hood, you will have
the devil’s own job in convincing
benevolent hackers, who think they can
generate better C code than Bjarne
Stroustrup can, getting in behind the
scenes and messing about with the way he
does classes for example. So, I suppose
what I’m saying more generally about
this, is that very often you will have a
very good solution for a language system
to establish a bridge-head, and to get
something working. But, in the longer run,
you might want a more direct version
that isn’t is prone to hacker intrusion,
gross abuse or just things going a bit
wrong because of the nature of the
intermediate language being so rich and
having a mind of its own. Now, you might say:
“That can’t be an issue, can it?” Yes, it can.
Because this whole question of ‘level’ of
your intermediate code. This thing gets
me there. So, why do I need to go direct?
Let me give you another example
Not C++ this time. Another well-known
example for many of you is PDF. It’s been
so well-established for so long now
since 1989,
that many of you using it will not know
that in the early days it came off the
back of Adobe’s very successful language
called PostScript. And PostScript was
there as, you know, the universal graphics language.
It drove laser printers it drove whatever
It was a wonderful achievement. In order to
get a PDF – the way you did it originally
was you compiled your PostScript with
with an Adobe-provided utility called
Distiller. But the problem was in many
ways it was very graphically
sophisticated but it was Turing Complete.
You could do anything in it [given enough memory]
and, indeed, I often thought: “Well the next
program I wrote in PostScript, before I
do any typesetting as ordered by the
customer, I will get my program to solve
a commands function first. Can you
imagine the delay: “I’m sorry, I’m going to
compute ackermann(3,1) before I turn my
attention to doing your miserable little
piece of typesetting. But in principle
you could have done that – as long as it
didn’t run out of memory. But, you know,
I’m just saying this to make the silly
point that that’s perfectly do-able!
You sometimes found that your PDF, produced
out of compiling PostScript with Distiller was
yards bigger than the input. Not very
often but sometimes. So, there again, you
see, in order to stop abuse and to point
the way to the future Adobe
very quickly said: “What we must do, for
those that don’t know about PostScript,
have no need for it, is to give a direct
route to PDF. And they called it PDFwriter,
back in the early days. And then,
of course, people don’t not wantingto be
bounden to Adobe, quite rightly said:
“Fabulous! What we need to do is to
replicate something of what Distiller
does. We’ll write utilities with names
like ‘ps2pdf’, which you’ll typically
find in PostScript offerings on Linux
and all this kind of stuff. But it makes
the point that very often that
directness of approach gives you a good result
and stops people messing about under the
hood and doing things which are
ridiculous and expensive. If you start
saying: “No, from now on it’s much quicker
to go direct, we know how to do it. Let’s
do it; let’s keep it clean”. So that is I
guess, I think, a feature. Still I keep
reading stories of people using
intermediate codes for compiling
programming languages who suddenly saying
“Well, 20 years down the line we think
intermediate codes are bad. It’s far
better to do it direct, in some other way”
And all you can say, out of this. is that
every time you get into porting software
you learn something every time about the
pros and cons of having an intermediate
representation. Or do you jump over it
and go direct? There is no universal
right answer. The more you look at the
scene at the moment about program
language implementation the more you
realize that a huge number of the
offerings out there might look to you
like straightforward point-to-point
compilers you know: ” … I’m running on …
whatever I’m running on at the moment … I’m running
on an ARM chip. It’s all self-hosted on the
ARM chip. It compiles ARM code for
further use on further ARM chips. It
doesn’t do anything else!” Not true. If you
look under the hood of gcc, of course
Stallman and the GNU effort did such a
wonderful job in creating for us a new
version of ‘cc’; when you look in there at
the possible back-ends for different
architectures you realize it’s really a
cross-compilation system. You can compile
from anything to anything. Now other
people, other than the GNU effort, got
there eventually and realized the same
thing. I mean I think Microsoft, in the
mid 80s, did actually had the nerve to
develop something that I think they
called “ntermediate Language”. I don’t
know whether Microsoft did try and
actually copyright the phrase
“Intermediate Language” but it’s part of
the same mindset. it’s not just them. {It’s also]
Apple and Steve Jobs
always used to have: “It may have
existed[before] but it was done by a bunch o
no-hopers. And until we discovered it;
packaged it; marketed it and put it up for you
you might as well think that it never existed!”
And that was Jobs through and through.
But maybe all big computer companies
have a little bit of this inside them:
” … it didn’t really exist, in a usable way,
until *we* discovered it”.