Quantcast
Channel: Hacker News
Viewing all articles
Browse latest Browse all 25817

The origins of XXX as FIXME

$
0
0

The token XXX is frequently used in source code comments as a way of marking some code as needing attention. (Similar to a FIXME or TODO, though at least to me XXX signals something far to the hacky end of the spectrum, and perhaps even outright broken).

It's a bit of an odd and non-obvious string though, unlikeFIXME and TODO. Where did this convention come from? I did a little bit of light software archaeology to try to find out. To start with, my guesses in order were:

  • MIT (since it sometimes feels like that's the source of 90% of ancient hacker shibboleths)
  • Early Unix (probably the most influential codebase that's ever existed)
  • Some kind of DEC thing (because really, all the world was a PDP)

Other uses of XXX

It turns out that XXX and xxx are incredibly annoying things to search for in old code. I'd bet it's the most common sequence of 3+ identical letters in source code. That means there's a ton of false positives to sift through. Here's a few examples of the kind of stuff that will be found.

By far the most common use of XXX in old is for it to be some kind of a template placeholder. This makes some sense;x for an unknown value has an obvious long history that predates computing. These templates might be used to describe the exact data layout of something, like in the following bits from the Apollo guidance computer:

# 17    ASTRONAUT TOTAL ATTITUDE      3COMP   XXX.XX DEG FOR EACH
# 18    AUTO MANEUVER BALL ANGLES     3COMP   XXX.XX DEG FOR EACH
# 19    BYPASS ATTITUDE TRIM MANEUVER 3COMP   XXX.XX DEG FOR EACH
# 20    ICDU ANGLES                   3COMP   XXX.XX DEG FOR EACH
# 21    PIPAS                         3COMP   XXXXX. PULSES FOR EACH
# 22    NEW ICDU ANGLES               3COMP   XXX.XX DEG FOR EACH
# 23    SPARE
# 24    DELTA TIME FOR AGC CLOCK      3COMP   00XXX. HRS. DEC ONLY

Or as just a wildcard for a bunch of related names, like the in this Lisp Machine source code:

;Q-FASL-xxxx refers to functions which load into the cold load, and
; return a "Q", i.e. a list of data-type and address-expression.
;M-FASL-xxxx refers to functions which load into Maclisp, and
; return a Lisp object.

Or as actual templates-as-program, with parts of an input remains while others (those marked with XXX) are programatically replaced. For example temporary file generation in in UNIXv5:

                f = ranname("/usr/lpd/dfxxx");

And finally, it could denote parts of persistent data structures that were reserved for future use (or no longer used), for example in CPM:

/* THE FILE CONTROL BLOCK FORMAT IS SH0WN BELOW:
   --------------------------------------------------------
   /    1 BY / 8 BY / 3 BY / 1 BY /2BY/1 BY/ 16 BY /
   /F1LETYPE/   NAME / EXT / REEL NO/XXX/RCNT/DM0 DM15/
   --------------------------------------------------------

   FILETYPE     :       0E5H IF AVAILABLE (OTHERWISE UNDEFINED NOW)
...
   XXX          :       UNUSED FOR NOW
   RCNT         :       RECORD COUNT IN FILE (0 TO , 127)

A less savoury use of XXX is as an identifier for something that didn't even qualify to have a real name. Most commonly it'd be the name of a branch target, like in a very early version of the C compiler:

    xxx:
        if (o==KEYW) {
                if (cval==EXTERN) {
                        o = symbol();
                        goto xxx;
                }

It could also be used to name variables. The following is from the FORTRAN II compiler for the IBM 704 from 1958. (I don't read 704 assembler, so maybe I'm misinterpreting what's going on in that program. It seems funny enough that I wanted to include it here anyway).

XXXXXX SYN 0  THE APPEARANCE OF THIS SYMBOL IN   F4400370
       REM       THE LISTING INDICATES THAT ITS  F4400380
       REM       VALUE IS SET BY THE PROGRAM.    F4400390

Some DEC code seems to have gone really overboard with this, with single source files having half a dozen different XXXYYY identifiers. (Sorry, had to use YYY as the placeholder there for obvious reasons).

Finally, there are all kinds of bizarre one-off uses. TENEX seems to have used XXX for implementing rubout. That is, when you'd press backspace to delete something you've typed, it'd print out XXX on the teletype to mark the deletion. (Rather than try to move the cursor back). Some kind proto-instant messaging program from 1976 written in Interlisp that I found would just printXXX as the error message for invalid user input.

Now, sorry if the above parts were kind of tedious. But there is actually a point here. Turns out that XXX is a really stupid marker to use for a FIXME. Looking at the Panda TOPS-20 distribution, there are 3083 instances of XXX, none of which are FIXMEs. Just about anything else would be easier to find. This makes its use as one of the three mainFIXME-markers all the more puzzling.

XXX as a FIXME

To get the negative results out of the way, there is absolutely no sign of this being an MIT or DEC thing. XXX as FIXME doesn't appear on ITS or TOPS-20 disks, nor does it appear in any of the mountains of really old Lisp code that I happened to have around; I don't think it makes it to Lisp-land until the mid-'80s. It's also absent in smaller collections of old code from other sources.

No, this seems to definitely be a Unix thing. There are a couple of interesting possibilities in early BSD. First, there's the following lines in a package of troff macros that first appeared in 2BSD, with a copyright date of 1978:

..
.de (t                 \" XXX temp ref to (z
.(z \\$1 \\$2
..
.de )t                 \" XXX temp ref to )t
.)z \\$1 \\$2

I'm pretty sure these are not actually a FIXME. It looks like the convention in this code was to mark.de commands with three character tags depending on their type, as explained in the beginning of the file:

+.\"	Code on .de commands:
+.\"		***	a user interface macro.
+.\"		&&&	a user interface macro which is redefined
+.\"			when used to be the real thing.
+.\"		$$$	a macro which may be redefined by the user
+.\"			to provide variant functions.
+.\"		---	an internal macro.

These lines seem to have been commands that didn't fit into those existing categories, and needed a new tag.

Next up, there's a bunch of very promising looking changes to the troff C source in the summer of 1980. Stuff like:

if(j == ' '){
        storeword(i,width(i));  /* XXX */
        continue;
}

That certainly looks like a classic FIXME. But I think this is another dead end. It turns out that after this change there are 37 /* XXX */ comments in code that didn't use to have any. And when comparing to Unix v7 source code, it looks like basically every single line that was changed got marked with one. So it's unlikely that these are actual FIXMEs. I think this was just the author making sure they could identify their changes, in case they wanted to reintegrate with "upstream".

Soon after that BSD moves to SCCS, and we start getting fine-grained changes rather than huge code-dumps. From there, it's easy to findthe first /* XXX */ commit from Nov 9, 1981. This one is interesting in a few ways:

  • This is definitely a FIXME; just a few very special parts of the code got tagged, and many of them got rewritten soon after.
  • After this commit, the use of /* XXX */ starts spreading quickly through the BSD codebase and eventually to other authors.
  • A closer reading of the commit shows something interesting: a bunch of /* ### */ comments. Going through the earlier history, it seems that Bill Joy had been marking his FIXMEs with ###, and halfway through this commit changed to usingXXX. I don't know why, or whether these two markers were intended to have slightly different semantics (like ### was code that needed to be fixed, XXX was code that was commented out and needed to be fixed and re-enabled). But XXX quickly became the preferred form.
                        if (rcv_empty(tp)) {                    /* 16 */
-                               tcp_close(tp, UCLOSED);
+                               sowakeup(tp->t_socket); /* ### */
+/* XXX */                      /* tcp_close(tp, UCLOSED); */
                                nstate = CLOSED;
                        } else

(On a personal note, as someone who goes out of their way to read through any published TCP stacks, I'm kind of amused that a search for a random historical trivia leads me to a damn TCP stack).

Leaving it at that seems like a good story. And I'd already checked basically all of Bell Labs code that I could find. It's not in Unix v2-v7 and not in the Programmer's Workbench. But then I decided to check Unix v1 just for completeness sake, and got very confused. Because...

/ XXX fix me, I dont quite understand what to do here or
/ what is done in the similar code below e407:
/ cmp   r5, u.count / see if theres enough room
/ bgt   1f
mov     r5,u.count / read text+data into core

WTF? It doesn't get any clearer than that. But where did it come from? And if this convention was used at Bell Labs in 1970, where did XXX disappear for a decade?

Turns out this was a false alarm. The only reason we have the Unix v1 source code in the first place is that a team of people transcribed the source from PDF scans to text. Then they went on to make it possible to compile the code and run it in an emulator. As part of this latter work, a block of code was added to the source. And a bit unfortunately it was this patched version rather than the "original" that made it to the Unix History Repo. This comment was actually from 2008, not 1971.

There's actually an interesting story behind that extra block of code, as told by Toomey. After finally getting the v1 kernel transcribed, compiled, and running, they hit the problem of the only having two userland programs available:init and sh. Everything else was using a more recent executable header. To be able to do anything at all with the system, they needed to add support for "0407 binaries" as opposed to the "0405" ones the kernel supported natively.

What about C code outside of Unix distributions? It's actually kind of hard to find any of that from before 1982. There might be an earlier instance in Gosling Emacs, though it differs from the modern form by going for a full 9 Xs:

#ifdef HalfBaked
/*    sigset (SIGINT, InterruptKey); *//*XXXXXXXXX*/
    sigset (SIGINT, InterruptKey);/*XXXXXXXXX*/
#endif

And there's a Changelog entry from July 1981, which seems to match up perfectly with both the functionality of the code, and the surroundingifdef:

Tue Jul  7 12:51:44 1981  James Gosling  (jag at VLSI-Vax)
        ... I also installed Dave
        Dyer's hack to allow ^G's to interrupt execution immediatly.  This
        has a rather major bug, and is the reason that I didn't implement
        it a long time ago: if you type ^G while Emacs is doing output,
        then all queued-but-not-printed characters get lost and Emacs no
        longer has any idea of what the screen looks like. It is pretty
        much impossible for Emacs to tell whether or not this has
        happened. You end up having to type ^L now and then.  The
        "HalfBaked" switch in config.h controls the compilation of this
        facility, ...

But thankfully this code has RCS history starting from 1986, and somebody did in fact edit this code in 1986 with no functional changes, but adding the commented out copy and theXXXXXXXXX:

 #ifdef HalfBaked
-    sigset (SIGINT, InterruptKey);
+/*    sigset (SIGINT, InterruptKey); *//*XXXXXXXXX*/
+    sigset (SIGINT, InterruptKey);/*XXXXXXXXX*/
 #endif

And those are the only signs of XXX in applications that could predate the BSD usage. Both were red herrings, caused by how difficult it's to actually find pristine copies of source code that old. It was very lucky that the Gosling Emacs comment was added after the code was put to RCS, and made not in the five year interval between the original commit and the project starting to use RCS.

So it seems likely that this convention was invented by Bill Joy in BSD. If he wasn't the first one, he was certainly the one that popularized it. Why he chose to switch to the rather inconvenientXXX from ### is unclear.

If you can find an earlier occurence (or know of good collections of pre-1981 C source code), please let me know and I'll update the post.


Viewing all articles
Browse latest Browse all 25817

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>