Quantcast
Channel: Hacker News
Viewing all articles
Browse latest Browse all 25817

A convenient untruth: Array notation in C is a lie

$
0
0

Array notation in C is a lie!

Sorry, dear reader*, but I cannot participate in this conspiracy any longer.  You have been lied to, manipulated and coerced into thinking arrays are a construct of the C language.  I feel it is my solemn duty to blow the whistle on this charade and expose the dirty secrets of C’s so-called arrays.

(* It is statistically possible that more than one person might read this, of course)

Here’s an array declaration in C:

int main(void)
{
  int arr[5];
}

(Fatuous side note:  when demonstrating arrays, always call them ‘arr’ – it allows you to talk like a pirate  🙂 )

Our compiler has allocated a contiguous sequence of integers.  We can tell by looking at the size of arr:

int main(void)
{
  int arr[5];

  printf(“%d”, sizeof(arr));  // => 20
}

That’s consistent with sizeof for scalar types, but perhaps not as useful as it could be.  Wouldn’t you rather know how many elements were in the array?  Preprocessor to the rescue!

#define ARRAY_SIZEOF(a) (sizeof(a) / sizeof(a[0]))

int main(void)
{
  int arr[5];

  printf(“%d”, ARRAY_SIZEOF(arr));  // => 5 (*much* more useful!)
}

So, a dumb question:  what’s the type of arr?  If you said int [5] you’d be right; but wait:

int main(void)
{
  int arr[5];

  int another_arr[5] = arr;  // Nope.
  another_arr = arr;         // Nope.
}

Even though arr and another_arr are declared as the same type I can’t use one to initialise the other; nor can I assign one array to another.

Why is this failing?  Because the array’s name is a lie!  Using a variable as an expression normally yields its value, but in the case of arrays the array name yields a pointer (to the first element; which is at least reasonable)

int main(void)
{
  int arr[5];

  int *ptr = arr;  // ptr holds the address of first element
}

The array’s name yields a non-modifiable l-valueexpression.  This (in effect) means the pointer is constant; hence why you can’t assign arrays to each other.

It might be tempting to believe at this point that arrays and pointers are pretty much the same thing.  That would be crazy thinking though – arrays are arrays; pointers are pointers.

Being a reasonable human being you should want to give the readers of your code a bit more of a clue as to what’s going on:

int main(void)
{
  int arr[5];

  int *int_ptr = &arr[0];  // The same as before; but now explicit
}

Just to mess with you some more:  taking the address of an array yields a pointer-to-array; and not, as many believe, a pointer to the first element:

int main(void)
{
  int arr[5];
  int *int_ptr;
  
  int_ptr = arr;       // OK.
  int_ptr = &arr       // Warning - actually the wrong type.


  int (*arr_ptr)[5];

  arr_ptr = arr;       // Nope.
  arr_ptr = &arr;      // Yup.
  int_ptr = *arr_ptr;  // OK.  Confused now?...
}

At this point we have to conclude the following about arrays:

  • Arrays are a contiguous sequence of objects
  • Arrays don’t behave the same as the types in the array.

Accessing array elements is done with the index operator ([]).  If it were only that simple.  The index operator is merely a smokescreen hiding the insidious truth:  Array access is pointer arithmetic.

Pointer arithmetic, as you will know doubt know, modifies the address stored in the pointer object by multiples of the size of the type being pointed to; that is:

int main(void)
{
  int a;
  int *int_ptr = &a;

  ++int_ptr;  // int_ptr => int_ptr + sizeof(int)
}

Ever wondered why pointer arithmetic is the way it is?  The answer is that it’s all part of the Great Array Conspiracy. (Which is not a real thing.  Yet)

Using array arithmetic, I could access array members like this:

int main(void)
{
  int arr[5];
  int *int_ptr = &arr[0];

  *(int_ptr + 3) = 100;  // Modify the 4th array element 
                         // via the pointer.
}

This is sneaky, underhand  and just plain difficult-to-read code I hope you would agree.  It’s the sort of code written by programmers who believe Code Obscurity == Job Security.

But this is exactly what the index operator is doing.  When you write (for example)

arr[0]

The compiler is re-writing your code as:

*(arr + 0)

We saw previously that the array name, as an expression, yields a pointer; so this is exactly the pointer arithmetic code we just mentioned.

The index operator is not a particularly fussy operator: any pointer of the correct type will do:

int main(void)
{
  int arr[5];
  int *int_ptr = &arr[0];  // Oooh, look – pointer arithmetic!


  int_ptr[3] = 100;        // Same as *(int_ptr + 3)  = 100;
                           // Also the same as arr[3] = 100;
}

Once again, many naïve programmers are left to reason that arrays and pointers must be the same, since they (appear) to work in the same way.

In C (as in mathematics) arithmetic is symmetrical; so a + b is the same as b + a.  Rather surprisingly this symmetry is also true for pointer arithmetic; and this can lead to some truly bizarre-looking code:

int main(void)
{
  int arr[5];
  int *int_ptr = &arr[0];

 
  *(int_ptr + 3) = 100;  // Same as int_ptr[3] = 100
  *(3 + int_ptr) = 100;  // Same as above.
  3[int_ptr]     = 100   // Exactly the same as the others!
}

If you’re ever tempted to write code like the last line above, you really need to sit down and have a good stiff word with yourself.

If there were multi-dimensional arrays in C I’d be able to write something like:

int main(void)
{
  int 1D_arr[10];
  int 2D_arr[10, 10];
  int 3D_arr[10, 10, 10];

  ...
}

Of course, I can’t.  C only allows one-dimensional arrays.  However, it is pretty lackadaisical about the type of object in the array.  It has no objections to having arrays as array elements

int main(void)
{
  int arr_arr[4][2];  // Array of arrays.  But which is which?
}

Although this looks like a two-dimensional array, it isn’t, it’s a contiguous sequence of eight integers.  The compiler, though, sees it as a contiguous sequence of four elements, where each element is a contiguous sequence of two integers.

(A special prize of a year’s supply of Brownie Points if you can work out the type of the expression arr_arr; (and no, it’s not int**))

Accessing array-of-array elements has simple syntax (once you’ve worked out which index represents which axis: the right-most index represents the ‘minor’ array; the left-most is the ‘major’ array)

int main(void)
{
  int arr_arr[4][2]; 

  arr_arr[3][1] = 100;  // Looks easy, but what’s really going on?
}

We know that the index operator is merely syntactic sugar to fool us (but we’re wise to that, now).  However, navigating your way through the collusion and misdirection to work out how array-of-array elements are accessed requires a degree of mental intrepidity.

We know how to deal with arr_arr[3].  It is pointer arithmetic that yields (in this case) an array of two integers (as an l-value expression).  We know that an array used as an l-value expression yields the address of the first element so we apply the pointer arithmetic again to get an integer (again as an l-value expression).

Of course, if you really want to send people off the edge into insanity you could always write the above code as:

int main(void)
{
  int arr_arr[4][2];

  1[3[arr_arr]] = 100;  // <= How to lose friends, fast.
}

The code below appears to refute this lie:

void process_array(int arr_param[10])
{
  ...
}


int main(void)
{
  int arr[10];

  process_array(arr);
}

Everything points to this being a pass-by-value call; a copy of arr is made on the call to process_array().

Let’s chip away at this façade.  Our ARRAY_SIZEOF macro from earlier should still work:

#define ARRAY_SIZEOF(a) (sizeof(a) / sizeof(a[0]))


void process_array(int arr_param[10])
{
  int sz = ARRAY_SIZEOF(arr_param);  // sz => 1. Or 2. Ummm?...
}


int main(void)
{
  int arr[10];

  int sz = ARRAY_SIZEOF(arr);       // sz => 10, as expected.
  process_array(arr);
}

The call should give us a clue why this is a lie:  an array name is being used as an l-value expression.  We know this really means ‘give a pointer to the first element’.  We also know a pointer is not the same as an array of integers.  So what’s happening?

The signature of the process_array() function is a lie:  You cannot pass an array to a function by value.

The parameter signature degenerates to a pointer.  To demonstrate this I could have written the function signature as

void process_array(int arr_param[])  // The index is ignored,
                                     // because it has no meaning

or perhaps even more accurately

void process_array(int *arr_param) // Exactly the same as the above.

The ARRAY_SIZEOF is simply compounding all the lies we covered previously to give us an answer we weren’t expecting.  Expanding the macro shows us why.

void process_array(int *arr_param)
{
  int sz = ARRAY_SIZEOF(arr_param);
  //     => sizeof(arr_param) / sizeof(arr_param[0])
  //     => sizeof(arr_param) / sizeof(*(arr_param + 0)
  //     => sizeof(int*)      / sizeof(int)
  //     => (4/8 bytes)       / (4 bytes)
}

Inside the function we are relying on the syntactic sugar of the index operator.  Insidious!

The actual value you get depends on your architecture.  On a 32-bit machine you’ll typically get 1, on a 64-bit machine you’ll get 2.

Literals are fixed values.  You can’t change the value of 26.7, or 136.  They are constants.  C supports string literals, defined as arrays of chars.

int main(void)
{
  puts(“An array of 20 chars”);
}

C lets you initialise arrays of characters with literals; and helpfully adds a NUL character terminator to the array.

int main(void)
{
  char string[] = “Hello world”;
}

It’s worth paying close attention to the type of the array:  char.  Not const char.  This means I can perform some subtle abuse:

void mod_string(char string[])  // Remember, it’s a lie!
{
  string[0] = ‘h’;              // Seems legit...
  puts(string);
}


int main(void)
{
  mod_string(“Hello world”);
}

What’s going to happen here?  We know the string literal won’t be copied (that’s a lie).  We also know the index operator will just operate on the address of the string literal.  So where’s the string literal being stored?  Most likely in the Code section.  Trying to modify the code section of my program will either do nothing (if my code section is in Flash / ROM) or cause a segmentation fault.  Joy.

The conclusion to all this?  Be careful with string literals.  The compiler may not stop you doing dumb and dangerous things.  Make life a little safer by only using const char with string literals in C.

Of course, if you’ve read this far you’ll (hopefully) realise that this post should have been taken in jest.  Arrays aren’t really a lie (any more than any of C’s constructs are).  Despite all the ‘trickery’ C’s arrays work well for many, many programming tasks.  They are – as the title of this article suggests – a very convenient set of untruths.

However, by exploring the C semantics we’ve highlighted some traps that can befuddle the unwary C neophyte.

If you want to explore the C and C++ languages in more detail the following courses might be of interest to you:

Glennan Carnie

Glennan is an embedded systems and software engineer with over 20 years experience, mostly in high-integrity systems for the defence and aerospace industry.

He specialises in C++, UML, software modelling, Systems Engineering and process development.

Latest posts by Glennan Carnie (see all)


Viewing all articles
Browse latest Browse all 25817

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>