Quantcast
Channel: Hacker News
Viewing all articles
Browse latest Browse all 25817

Having fun abusing the C language

$
0
0

I was looking through the book Expert C Programming again when I came upon their “light relief” section on The Internation Obfuscated C Code Competition. It’s a contest to write the most obscure code. That the language C has a competition where you write confusing code in it probably says something about the language. I wanted to see how one of the entries to this contest works. Not finding any explanation in Internet searches, I decided to investigate for myself.

The IOCCC was inspired by Steve Bourne when he decided to (ab)use the C preprocessor to write his Unix shell in a syntax of C more like Algol-68 with explicit end statement cues where you have code like

if
  ...
fi    

He achieved this with code similar to

#define IF if(
#define THEN ){
#define ELSE } else {
#define FI ;}

This let him write code like

IF *s2++ == 0
THEN return(0);
FI

As Expert C says about this kind of code;

Shun any use of the C preprocessor that modifies the underlying language

One of the early winners in 1987 was a one-liner from David Korn, writer of the Korn shell (what’s up with these shell writers):

main(){printf(&unix["\021%six\012\0"], (unix)["have"]+"fun"-0x60);}

That’s it. Go ahead and compile it. What does it print?

It won’t work on Microsoft (hint!), but I found it already on the online compiler ideone where you can try it. (Some things added to get it to work there, but otherwise the same.)

It just prints

unix

Where does that come from? There’s what looks like an array named unix, but it’s not declared here. Is unix a keyword? Is it somehow printing the variable name?

I blindly tried to test it by adding a

printf(unix);

at the end to have it tell me printf takes a char * and not an int.

Printing it out as an int it tells me that it’s value is 1. This makes me think it’s a #define, like define this as being compiled on a Unix system. Searching the gcc source code, I find that it’s a run-time target specification. This explains why it won’t work on Windows.

unix is just 1. Rewritten we have

main(){printf(&1["\021%six\012\0"], (1)["have"]+"fun"-0x60);}

So unix wasn’t the name of an array variable, but how does 1[] work? I’ve seen this before and it’s one of my favorite facts about C.

C has origins in the langauge BCPL. From Dr. Martin Richards, creator BCPL;

The monodic indirection operator ! takes a pointer as it’s argument and returns the contents of the word pointed to. If v is a pointer !(v+I) will access the word pointed to by v+I. … The dyadic version of ! is defined so that v!i = !(v+I). v!i behaves like a subscripted expression with v being a one dimensional array and I being an integer subscript. Note that, in BCPL v!5 = !(v+5) = !(5+v) = 5!v. The same happens in C, v[5] = 5[v].

In other words, subscripts are just adding to pointers, and since addition is commutative, then so is the subscript operator. Go ahead and try that too.

int x[] = {1, 2, 3};
printf("%d\n%d\n", x[1], 1[x]);

Then what’s 1["\021%six\012\0"]? Written the normal way we see accessing array elements with the subscript operator, we have "\021%six\012\0"[1]. Still not typical, but you can see it’s array[index], albeit usually a string litteral isn’t used. But that works too, try that as well;

printf("%c\n", "hello, world"[1]);

Let’s rewrite just that first array while we’re figuring it out.

main() {
  char str[] = "\021%six\012\0";
  printf(&str[1], (1)["have"]+"fun"-0x60);
}

That still works the same. Looking at str, I wonder about the \0 which is the null character (or NUL character?) I thought C string literals have a null character by default. Seeing what happens when we remove it;

printf("%s", "\021%six\012");

prints

█%six

I use the format string "%s" because the string I’m trying to print contains the format character %. (C programming tip: don’t just print strings like printf(myStr) in case the string you’re trying to print has a format character. Print through %s as shown above.)

It still seems to work without the \0. Maybe you had to add your own null characters to string literals in some pre-ANSI C? I guess not since other strings in the program don’t have it. Or that was more obfuscation? Regardless, let’s leave out that \0.

While we’re at it, let’s look at the rest of that string. \xxx is how to give a single character in octal, \021 is some control character and \012 is line feed, or \n as you normally see at the end of strings you want to print.

Knowing \021 is just a single character, str[1] is %. &str[1] then is the string starting at %. So the string can actually just be %six\n, leaving out that control character that I don’t even know what it means.

main() {
  char str[] = "%six\n";
  printf(str, (1)["have"]+"fun"-0x60);
}

The first string passed to printf is the format string, and %s means put the next string in its place. With this string ending in ix, we can guess that the next string passed to printf must be un some how. That’s simple enough for us to get rid of that character array we used to pull it out.

main() {
  printf("%six\n", (1)["have"]+"fun"-0x60);
}

For the next string, we have (1)["have"]+"fun"-0x60. There’s a un in fun, now we have to get to it.

We have that indexing trick again with (1)["have"]. The parentheses around 1 aren’t needed. Again, required in old C, or more misdirection? "have"[1] gives us a. The hex value for the character a is 0x61 and we subtract 0x60. This is then just 1+"fun".

Similar to before, "fun" basically resolves to a char *. Adding 1 to it gives us the string starting at the second character, or un. This becomes

main() {
  printf("%six\n", "un");
}

There’s the unobfuscated code.

I like how there’s some obfuscation that’s more symantic, like using the defined word unix to perhaps try to throw you off thinking it’s printing that #define itself somehow. The character \021 being a kind of reversed \012 may be to make you think it’s significant when it’s actually not used. There’s also possibly the format string %six containing what looks like the word “six”, perhaps meant to make you not think that the ‘s’ is used as a format character.

A lot to unpack from such little code, and plenty to learn.

Discuss on Hacker News or Reddit.



Viewing all articles
Browse latest Browse all 25817

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>