Arrays

An array is a homogeneous collection of data. By homogeneous, it means that all values in the collection have the same type.

Declaration

Array

SyntaxNotes

Uninitialized Declaration
<data type> <array name> [ <size> ]

Initialized Declaration
<data type> <array name> [ <size> ] = { <expr>, <expr>, ... }

The <size> must be a constant unsigned integer (i.e., constant value or from preprocessor directive `#!c #define).
The number of initial values in { <expr>, <expr>, ... } should match the <size> or some warnings with undefined behaviour may occur.
Each initial values in { <expr>, <expr>, ... } should have the same data type as <data type>.
The initialisation can only be done at the time of declaration.

It is guaranteed that the elements of the array occupy contiguous memory locations. This will lead to a very nice retrieval and update instructions later on. Visually, we typically represent the memory as sequence of box where each box is a memory location the size of the data type and subscripted (e.g., C[0], C[1], ...) with the index.

Integer Array of Size 30

Array

Initialisation

CodeReplIt

ArrInit.c
// a[0]=54, a[1]=9, a[2]=10
int a[3] = {54, 9, 10};

// size of b is 3 with b[0]=1, b[1]=2, b[2]=3
int b[] = {1, 2, 3};

// c[0]=17, c[1]=3, c[2]=10, c[3]=0, c[4]=0
int c[5] = {17, 3, 10}; 

// size of d is 2 with d[0]=1, d[1]=2
int d[2] = {1, 2, 3};  // warning issued: excess elements

Under Initialisation

Under initialisation happens when you are initialising the array with fewer elements that the size can contain. What will happen is that the rest of the element will be assigned the value of 0. This is independent of the compiler used (GCC or Clang).

After Initialisation

CodeReplIt

AfterInit.c
int e[5];
e[5] = {8, 23, 12, -3, 6}; // too late to do this;
                           // compilation error

Retrieval and Update

Once we have declared an array, we can retrieve the element as well as update¹ the element at a certain index.

Retrieval and Update

SyntaxNotes

Retrieval
<array name> [ <index> ];

Update
<array name> [ <index> ] = <expr>;

<index> is an unsigned integer.
- As such, it is greater than or equal to 0.
- We call this 0-indexed.
The sub-expression <array name> [ <index> ] retrieves the box at th given index (i.e., the <index>+1^th element due to 0-indexing).
We cannot assign to the array (i.e., <array name> = <expr>;)!

Array Summation

ArraySumV1.cArraySumV2.c

CodeReplIt

ArraySumV1.c
#include <stdio.h>
#define MAX 5

int main(void) {
  int numbers[MAX];
  int i, sum = 0;

  printf("Enter %d integers: ", MAX);
  for (i=0; i<MAX; i++) {
    scanf("%d", &numbers[i]);
  }

  for (i=0; i<MAX; i++) {
    sum += numbers[i];
  }

  printf("Sum = %d\n", sum);
  return 0;
}

CodeReplIt

ArraySumV2.c
#include <stdio.h>
#define MAX 5

int main(void) {
  int numbers[MAX] = {4,12,-3,7,6};
  int i, sum = 0;

  for (i=0; i<MAX; i++) {
    sum += numbers[i];
  }

  printf("Sum = %d\n", sum);
  return 0;
}

Array Assignment

CodeReplIt

ArrayAssignment.c
#define N 10
int source[N] = { 10, 20, 30, 40, 50 };
int dest[N];
dest = source;  // illegal!
                // We cannot assign to the array (i.e., `<array name> = <expr>;`)!

Arrays and Pointers

So far, the syntax for array retrieval and update is the same as in other programming languages. What sets C array aparts from other programming languages is that there is a correspondence between the array and pointers. The correspondence can be summarised as the following:

Array-Pointer Correspondence

The name of the array corresponds to the address of the first element.

To put it as a code, we can write it as:

(arr == &arr[0]) == true;

Of course, we cannot simply state such correspondence without proof.

Proof

CodeReplIt

ArrPtrCorrespondence.c
int a[3] = {1, 2, 3};
printf("a = %p\n", a);
printf("&a[0] = %p\n", &a[0]);
printf("&a[1] = %p\n", &a[1]);
printf("&a[2] = %p\n", &a[2]);
printf("(a == &a[0]) = %d\n", a == &a[0]); // Print Boolean as int
printf("(a == &a) = %d\n", a == &a);       // This is uncommon and may be compiler specific

This correspondence hides a very important implication. Some of these implications are rather weird, but they are a logical implication. It is best if you go through the following parts slowly.

Array Decay

This behaviour of array being treated like a pointer is called array decay. The array decays into a pointer. The actual of an integer array of size 5 is actually int[5]. However, this information is lost (hence the decay) whenever we use the array name. The operator sizeof is not technically a function so when array is used as an argument to this operator, it does not decay. This results in the difference in behaviour of sizeof(arr) inside the scope the array is declared vs outside of it (e.g., when passed into parameters).

Retrieval ≈ Dereferencing

Since "The name of the array corresponds to the address of the first element", as a logical conclusion, the following code:

*arr

Retrieves the first element of the array arr. And we know that the first element is at index 0. We can incorporate this information as a shift from first address since we also know that "the elements of the array occupy contiguous memory locations".

*(arr + 0)

Now, we can see how a retrieval of any index can be mapped into a dereferencing operation with shifts. Given an index idx, we can also retrieve the element via:

*(arr + idx)

As such, we can summarise this as:

Retrieval ≈ Dereferencing

arr[idx] == *(arr + idx)

Commutativity

Due to the commutativity of the + operator², we can have the following weird operation:

Commutativity
   arr[idx]
=> *(arr + idx)  // by dereferencing
=> *(idx + arr)  // by commutativity
=> idx[arr]

But remember, idx is an unsigned int and arr is an array. Basically, what we want to say is that the following code weirdly works.

CodeReplIt

CommutativeArr.c
int arr[3] = {1, 2, 3};
printf("%d\n", 1[arr]); // equivalent to arr[1]

This simple pointer arithmetic is the reason why array access is so fast. Coupled with the fact that we do not check whether the index is out of bound, this operation is ultra fast and pretty much unsafe. The job of checking if the index is out of bound is delegated to you as the programmer.

Array Variable ≉ Variable

Recall that each variable has four attributes: name, type, address and value. Array variable clearly has name and type. However, because "The name of the array corresponds to the address of the first element" and that the name is how we retrieve a value for an array, we have to accept the conclusion that an array variable is not a usual variable.

In particular, the address and the value of an array variable are always the same. Looking at it through the box-and-arrow diagram below, the correct visual representation of an array should the one on the top instead of the bottom.

Array Box-and-Arrow

In the correct visualisation, there is no memory allocated for the array variable. This is the reason why there is no address corresponding to the name. In the incorrect visualisation, the box for a assumes that it has a separate address from a[0]. As such, the name a is treated by the compiler to be a placeholder for the address of its first element. However, this only happens in the function where the array is declared. As you will see later when we pass an array into a function, this does not happen for function parameters.

Although this may seem weird, it actually explains why you cannot assign to an array variable. How can you when there is no variable to store such value! What you can do instead is to update the elements of the array as exemplified below:

ArrayCopy.c

CodeReplIt

ArrayCopy.c
#define N 10
int source[N] = { 10, 20, 30, 40, 50 };
int dest[N];
int i;
for (i = 0; i < N; i++) {
  dest[i] = source[i];
}

memcpy()

There is another function available in string library (i.e., #include <string.h>) called memcpy() that allows us to copy a sequence of memory location into another sequence of memory location. Since an array is a sequence of memory location, we can also use this to copy an array. However, this function is outside the scope of the module.

Arrays and Functions

For a function to accept an array, we simply have to specify it in the function prototype. Let's consider the array summation code we have from before. The prototype should be one of the following:

Sum Array Prototype
int sumArray(int[], int);
int sumArray(int arr[], int size);

We can now write the function as in the example below:

ArraySumFunction.c

CodeReplIt

ArraySumFunction.c
#include <stdio.h>

int sumArray(int [], int);

int main(void) {
  int val[6] = {44, 9, 17, -4, 22};
  printf("Sum = %d\n", sumArray(val, 6));
  return 0;
}

int sumArray(int arr[], int size) {
  int i, sum=0;

  for (i=0; i<size; i++) {
    sum += arr[i];
  }
  return sum;
}

Sum Array

When you run the code, the box-and-arrow diagram would look something like the image on the right. Do note that since the variable arr is a parameter in the function sumArray, we have an actual variable storing the value! The value here is still the starting address of the array. Hence, you see that it now truly behaves like a pointer.

Array Size

Since C array does not carry information about the size --partly due to the arrays and pointers correspondence-- most functions working with array have to also accept the size of the array separately.

Of a particular interest is whether we can actually specify the size as part of the array parameter itself. To be more precise, can we specify the function definition as the following:

Array Size
int sumArray(int arr[8], int size) { ... }

Unfortunately, in this case, the compiler will ignore this size. The actual number of elements to be processed should depend on the variable size.

By the arrays and pointers correspondence above, we can have an alternate definition to the function using pointers. In fact, you will see that the function body will remain the same while the function prototype will change.

ArraySumPointer.c

CodeReplIt

ArraySumPointer.c
#include <stdio.h>

int sumArray(int*, int);

int main(void) {
  int val[6] = {44, 9, 17, -4, 22};
  printf("Sum = %d\n", sumArray(val, 6));
  return 0;
}

int sumArray(int *arr, int size) {
  int i, sum=0;

  for (i=0; i<size; i++) {
    sum += arr[i];
  }
  return sum;
}

Quick Quiz

QuestionCodeReplIt

The sum of square of an array \(a\) is defined as the following mathematical summation for an array of size \(n\):

\(\sum^{n}_{i=0} a[i]\)

Write the function sum_of_square that accepts a double array and returns the sum of square of the array.

SumOfSquare.c
double sum_of_square(double[], int); // prototype
:
double sum_of_square(double arr[], int size) {
  int i;
  double sum = 0.0;
  for(i=0; i<size; i++) {
    sum += (arr[i] * arr[i]);
  }
  return sum;
}

We have learnt that for a function to modify a variable (e.g., v) outside the function, the caller has to pass the address of the variable (e.g., &v) into the function. What about an array? By the arrays and pointers correspondence, the value of the address itself is already the address. As such, there is no need to pass the address explicitly into the function.

Side-Effect

Since passing an array to a function passes the address, whether intended or not, a function can modify the content of the array it received. This changes can be seen by the caller and hence constitute a side-effect of the function (the main effect is the return value). You as the programmer will have to ensure that no changes are made when no changes are supposed to be made.

ArrayModify.c

CodeReplIt

ArrayModify.c
#include <stdio.h>

void modifyArray(float [], int);
void printArray(float [], int);

int main(void) {
  float num[4] = {3.1, 5.9, -2.1, 8.8};
  modifyArray(num, 4);
  printArray(num, 4);
  return 0;
}

void modifyArray(float arr[], int size) {
  int i;
  for (i=0; i<size; i++) {
    arr[i] *= 2;
  }
}

No Return

Note the absence of return statement in the function modifyArray. The return statement is unnecessary because the modification is done in-place.

Quick Quiz

QuestionCodeReplIt

Write a function abs_array to modify an array such that each element in the array is changed to its absolute value. You may assume that the array is an array of int.

AbsArray.c
void abs_array(int[], int); // prototype
:
void abs_array(int arr[], int size) {
  int i;
  for(i=0; i<size; i++) {
    if(arr[i] < 0) {
      arr[i] = -arr[i]; // or abs(arr[i]) if you include <math.h>
    }
  }
}

Why Pointers?

Let's look at a possible reason why the name of the array is treated as the pointer to the first element. Here, we will use a numerical argument. Do not worry too much if you cannot follow the computation closely.

Consider an array of int with 1 billion element. Since each int has a size of 4 bytes, the total size of:

4 × 1,000,000,000 = 4,000,000,000 bytes ≈ 4GB

This array has an enormous size of 4GB! That's roughly the size of the RAM of a common laptop. So the array already takes up the entire RAM.

Now consider if calling a function actually copies the array instead just the pointer. It means we now have to be able to store twice the size of the array. That equates to 8GB! Clearly, that's larger than what most common laptops would have.

Even worse, imagine if the function then calls another function (or even recursion). The amount of memory needed is going to be unreasonably large. To remedy this, the convention is to simply pass the pointer to the first element. As a problematic side-effect, the function may modify the content of the array. This means that it is now the responsibility of the programmers to avoid this.

This is often called array assignment operation. And in fact, we will often call it that. However, it is good to mentally separate the operation and keep assignment simple by assuming <var> = <expr> where <var> is a variable name. On the other hand, update requires the left-hand side to be an array. ↩
We did a bit of hand-waving here, because it is not obvious that + is commutative in the case the operands are of different type. In arr + idx, the left operand is an address and the right operand is an integer. On the other hand, for idx + arr, the left operand is a number and the right operand is an integer. ↩