The token XXX
is frequently used in source code
comments as a way of marking some code as needing attention.
(Similar to a FIXME
or TODO
, though at
least to me XXX
signals something far to the hacky
end of the spectrum, and perhaps even outright broken).
It's a bit of an odd and non-obvious string though, unlikeFIXME
and TODO
. Where did this
convention come from? I did a little bit of light software archaeology
to try to find out. To start with, my guesses in order were:
- MIT (since it sometimes feels like that's the source of 90% of ancient hacker shibboleths)
- Early Unix (probably the most influential codebase that's ever existed)
- Some kind of DEC thing (because really, all the world was a PDP)
Other uses of XXX
It turns out that XXX
and xxx
are
incredibly annoying things to search for in old code. I'd bet it's
the most common sequence of 3+ identical letters in source code.
That means there's a ton of false positives to sift through.
Here's a few examples of the kind of stuff that will be found.
By far the most common use of XXX
in old is for it to
be some kind of a template placeholder. This makes some sense;x
for an unknown value has an obvious long history
that predates computing. These templates might be used to describe
the exact data layout of something, like in the following bits from the
Apollo guidance computer:
# 17 ASTRONAUT TOTAL ATTITUDE 3COMP XXX.XX DEG FOR EACH # 18 AUTO MANEUVER BALL ANGLES 3COMP XXX.XX DEG FOR EACH # 19 BYPASS ATTITUDE TRIM MANEUVER 3COMP XXX.XX DEG FOR EACH # 20 ICDU ANGLES 3COMP XXX.XX DEG FOR EACH # 21 PIPAS 3COMP XXXXX. PULSES FOR EACH # 22 NEW ICDU ANGLES 3COMP XXX.XX DEG FOR EACH # 23 SPARE # 24 DELTA TIME FOR AGC CLOCK 3COMP 00XXX. HRS. DEC ONLY
Or as just a wildcard for a bunch of related names, like the in this Lisp Machine source code:
;Q-FASL-xxxx refers to functions which load into the cold load, and ; return a "Q", i.e. a list of data-type and address-expression. ;M-FASL-xxxx refers to functions which load into Maclisp, and ; return a Lisp object.
Or as actual templates-as-program, with parts of
an input remains while others (those marked with XXX
)
are programatically replaced. For example temporary file generation
in in UNIXv5:
f = ranname("/usr/lpd/dfxxx");
And finally, it could denote parts of persistent data structures that were reserved for future use (or no longer used), for example in CPM:
/* THE FILE CONTROL BLOCK FORMAT IS SH0WN BELOW: -------------------------------------------------------- / 1 BY / 8 BY / 3 BY / 1 BY /2BY/1 BY/ 16 BY / /F1LETYPE/ NAME / EXT / REEL NO/XXX/RCNT/DM0 DM15/ -------------------------------------------------------- FILETYPE : 0E5H IF AVAILABLE (OTHERWISE UNDEFINED NOW) ... XXX : UNUSED FOR NOW RCNT : RECORD COUNT IN FILE (0 TO , 127)
A less savoury use of XXX
is as an identifier for
something that didn't even qualify to have a real name. Most
commonly it'd be the name of a branch target, like in a very
early version of the C compiler:
xxx: if (o==KEYW) { if (cval==EXTERN) { o = symbol(); goto xxx; }
It could also be used to name variables. The following is from the FORTRAN II compiler for the IBM 704 from 1958. (I don't read 704 assembler, so maybe I'm misinterpreting what's going on in that program. It seems funny enough that I wanted to include it here anyway).
XXXXXX SYN 0 THE APPEARANCE OF THIS SYMBOL IN F4400370 REM THE LISTING INDICATES THAT ITS F4400380 REM VALUE IS SET BY THE PROGRAM. F4400390
Some DEC code seems to have gone really overboard with this, with
single source files having half a dozen different XXXYYY
identifiers. (Sorry, had to use YYY as the placeholder there for
obvious reasons).
Finally, there are all kinds of bizarre one-off uses. TENEX seems
to have used XXX
for implementing rubout. That is,
when you'd press backspace to delete something you've typed, it'd
print out XXX on the teletype to mark the deletion. (Rather than
try to move the cursor back). Some kind proto-instant messaging
program from 1976 written in Interlisp that I found would just printXXX
as the error message for invalid user input.
Now, sorry if the above parts were kind of tedious. But there is
actually a point here. Turns out that XXX
is
a really stupid marker to use for
a FIXME
. Looking at the Panda TOPS-20 distribution,
there are 3083 instances of XXX, none of which are FIXME
s.
Just about anything else would be easier to
find. This makes its use as one of the three mainFIXME
-markers all the more puzzling.
XXX
as a FIXME
To get the negative results out of the way, there is absolutely no
sign of this being an MIT or DEC thing. XXX
as FIXME
doesn't appear on ITS or TOPS-20 disks, nor does it appear in any of
the mountains of really old Lisp code that I happened to have around;
I don't think it makes it to Lisp-land until the mid-'80s. It's
also absent in smaller collections of old code from other sources.
No, this seems to definitely be a Unix thing. There are a couple of interesting possibilities in early BSD. First, there's the following lines in a package of troff macros that first appeared in 2BSD, with a copyright date of 1978:
.. .de (t \" XXX temp ref to (z .(z \\$1 \\$2 .. .de )t \" XXX temp ref to )t .)z \\$1 \\$2
I'm pretty sure these are not actually a FIXME
.
It looks like the convention in this code was to mark.de
commands with three character tags depending
on their type, as explained in the beginning of the file:
+.\" Code on .de commands: +.\" *** a user interface macro. +.\" &&& a user interface macro which is redefined +.\" when used to be the real thing. +.\" $$$ a macro which may be redefined by the user +.\" to provide variant functions. +.\" --- an internal macro.
These lines seem to have been commands that didn't fit into those existing categories, and needed a new tag.
Next up, there's a bunch of very promising looking changes to the troff C source in the summer of 1980. Stuff like:
if(j == ' '){ storeword(i,width(i)); /* XXX */ continue; }
That certainly looks like a classic FIXME
. But I think
this is another dead end. It turns out that after this change there are
37 /* XXX */
comments in code that didn't use to have any.
And when comparing to Unix v7 source code, it looks like basically
every single line that was changed got marked with one. So it's
unlikely that these are actual FIXME
s. I think this
was just the author making sure they could identify their changes,
in case they wanted to reintegrate with "upstream".
Soon after that BSD moves to SCCS, and we start getting fine-grained
changes rather than huge code-dumps. From there, it's easy to findthe first /* XXX */
commit
from Nov 9, 1981. This one is interesting in a few ways:
- This is definitely a
FIXME
; just a few very special parts of the code got tagged, and many of them got rewritten soon after. - After this commit, the use of
/* XXX */
starts spreading quickly through the BSD codebase and eventually to other authors. - A closer reading of the commit shows something interesting:
a bunch of
/* ### */
comments. Going through the earlier history, it seems that Bill Joy had been marking hisFIXME
s with###
, and halfway through this commit changed to usingXXX
. I don't know why, or whether these two markers were intended to have slightly different semantics (like###
was code that needed to be fixed,XXX
was code that was commented out and needed to be fixed and re-enabled). ButXXX
quickly became the preferred form.
if (rcv_empty(tp)) { /* 16 */ - tcp_close(tp, UCLOSED); + sowakeup(tp->t_socket); /* ### */ +/* XXX */ /* tcp_close(tp, UCLOSED); */ nstate = CLOSED; } else
(On a personal note, as someone who goes out of their way to read through any published TCP stacks, I'm kind of amused that a search for a random historical trivia leads me to a damn TCP stack).
Leaving it at that seems like a good story. And I'd already checked basically all of Bell Labs code that I could find. It's not in Unix v2-v7 and not in the Programmer's Workbench. But then I decided to check Unix v1 just for completeness sake, and got very confused. Because...
/ XXX fix me, I dont quite understand what to do here or / what is done in the similar code below e407: / cmp r5, u.count / see if theres enough room / bgt 1f mov r5,u.count / read text+data into core
WTF? It doesn't get any clearer than that. But where did it come from?
And if this convention was used at
Bell Labs in 1970, where did XXX
disappear for a
decade?
Turns out this was a false alarm. The only reason we have the Unix v1 source code in the first place is that a team of people transcribed the source from PDF scans to text. Then they went on to make it possible to compile the code and run it in an emulator. As part of this latter work, a block of code was added to the source. And a bit unfortunately it was this patched version rather than the "original" that made it to the Unix History Repo. This comment was actually from 2008, not 1971.
There's actually an interesting story behind that extra block of
code, as told by Toomey. After finally getting the v1
kernel transcribed, compiled, and running, they hit the
problem of the only having two userland programs available:init
and sh
. Everything else was using
a more recent executable header. To be able to do anything at all
with the system, they needed to add support for "0407 binaries" as
opposed to the "0405" ones the kernel supported natively.
What about C code outside of Unix distributions? It's actually kind
of hard to find any of that from before 1982.
There might be an earlier instance in Gosling Emacs, though it
differs from the modern form by going for a full 9 X
s:
#ifdef HalfBaked /* sigset (SIGINT, InterruptKey); *//*XXXXXXXXX*/ sigset (SIGINT, InterruptKey);/*XXXXXXXXX*/ #endif
And there's a Changelog entry from July 1981, which seems to match
up perfectly with both the functionality of the code, and the surroundingifdef
:
Tue Jul 7 12:51:44 1981 James Gosling (jag at VLSI-Vax) ... I also installed Dave Dyer's hack to allow ^G's to interrupt execution immediatly. This has a rather major bug, and is the reason that I didn't implement it a long time ago: if you type ^G while Emacs is doing output, then all queued-but-not-printed characters get lost and Emacs no longer has any idea of what the screen looks like. It is pretty much impossible for Emacs to tell whether or not this has happened. You end up having to type ^L now and then. The "HalfBaked" switch in config.h controls the compilation of this facility, ...
But thankfully this code has RCS history starting from 1986, and
somebody did in fact edit this code in 1986 with no functional changes,
but adding the commented out copy and theXXXXXXXXX
:
#ifdef HalfBaked - sigset (SIGINT, InterruptKey); +/* sigset (SIGINT, InterruptKey); *//*XXXXXXXXX*/ + sigset (SIGINT, InterruptKey);/*XXXXXXXXX*/ #endif
And those are the only signs of XXX
in
applications that could predate the BSD usage. Both were
red herrings, caused by how difficult it's to actually find
pristine copies of source code that old. It was very lucky that
the Gosling Emacs comment was added after the code was put to RCS,
and made not in the five year interval between the original commit
and the project starting to use RCS.
So it seems likely that this convention was invented by Bill Joy
in BSD. If he wasn't the first one, he was certainly the one that
popularized it. Why he chose to switch to the rather inconvenientXXX
from ###
is unclear.
If you can find an earlier occurence (or know of good collections of pre-1981 C source code), please let me know and I'll update the post.