Pointers and Arrays

Definitions

A pointer is an expression that contains an object’s address in memory and a scale factor equal to sizeof object.  A pointer’s dual properties of address and scale make it akin to a vector.

The expression sizeof obj returns an integer equal to the number of bytes spanned by object obj, including internal slack bytes, plus the number of trailing slack bytes reserved to force alignment of a potential consecutive object of the same type.

An lvalue is an expression which refers to an object, and therefore, it may occupy the left side of an assignment statement in order to modify the object’s value.

The & operator is called the address operator.  The expression &b returns a pointer to type of b, where b is an lvalue.  The resulting pointer is never an lvalue, as the address of an object cannot be changed.

The * operator is called the indirection or the dereferencing operator.  When the address of variable b has been assigned to pointer variable pb, the expression *pb provides indirect access to b.

The operators * and & cancel each other.  The expression *&b returns an object of the type and value of b.


Pointer Arithmetic

An expression consisting of the subtraction of 2 pointers to elements of the same array returns an integer equal to the difference of the subscript numbers of the elements.

For a pointer ptr and an integer i, the expression ptr + i or ptr - i returns a pointer to type of *ptr, containing the value in ptr incremented or decremented by an amount equal to the value of i multiplied by sizeof *ptr.  The prefix and postfix increment operators may be applied to pointers.

There is no direct means of adjusting a pointer’s value by an arbitrary amount.  Only multiples of the size of the dereferenced pointer may be added or subtracted.

Pointer arithmetic is limited to these two operations.

The [] operator is called the array operator.  The expression a[i] returns the element of array a with subscript number i.  An array name with no attached subscript returns a const pointer to the first element of the array:

    a == &a[0],  *a == a[0]

Since addition of an integer to a pointer scales the amount added to the address by the size of the dereferenced pointer:

    a[i] == *(&a[0] + i) == *(a + i)

The array operator is a dereferencing operator.  For pointer variable ptr, the expression ptr[i] returns the object of type of *ptr located at address returned by ptr + i.  Therefore, the application of the indirection and array operators to either an array name or a pointer variable is equivalent.  In general:

    ptr == &ptr[0],  *ptr == ptr[0]

The operators & and [0] at opposite sides of the same expression cancel each other.

The expression a[i] is compiled as *(a + i).  The expression *(ptr + i) is compiled as such, but may be coded ptr[i].

C guarantees that an expression that returns a pointer to the memory location just beyond the last element of an array is a valid term in a comparison expression, even though it points to an undefined area, and for this reason it may not be dereferenced.

Note that [] has higher precedence than the operators: ++, --, *, and & which have equal precedence and associate right-to-left.

    char a[10], *pa, *pa1 = &a[6], *pa2 = &a[4];
    long b[12], *pb;
    int diff;

    diff = &a[10] - a;                                   /* diff == 10   */
    diff = pa1 - a;                                      /* diff == 6    */
    diff = &a[6] - &a[4];                                /* diff == 2    */
    diff = pa1 - pa2;                                    /* diff == 2    */
    pa = a;                                              /* pa == &a[0]  */
    pa = a + 3;                                          /* pa == &a[3]  */
    --pa;                                                /* pa == &a[2]  */
    pa += 7;                                             /* pa == &a[9]  */
    pa = &a[6] - 2;                                      /* pa == &a[4]  */
    pa = pa1 - 2;                                        /* pa == &a[4]  */
    pa = pa1 - diff;                                     /* pa == &a[4]  */

    /* Synchronize pb to pa by element number                            */

    pb = b + (pa - a);                                   /* pb == &b[4]  */
    *pb = 6;                                             /* b[4] == 6    */

    /* Combine dereferencing and increment operators                     */

    *++pb = 7;                                           /* b[5] == 7    */
    ++*pb;                                               /* b[5] == 8    */
    (*pb)++;                                             /* b[5] == 9    */
    *pb++ = 10;                                          /* b[5] == 10   */

    *pb = 11;                                            /* b[6] == 11   */

To traverse an array, it is more efficient to use a dereferenced pointer than a subscript for referencing the elements.  The first for statement below uses a subscript to initialize all elements of array a.  The second for statement represents the way the compiler processes the first.

The second for statement requires three operations to address each element of a.  The expression i++ increments i by 1, and the expression a + i multiplies i by sizeof(int), and then adds the product to &a[0].

The third for statement requires just one operation to address each element of a.  The expression ptr++ increases the value in ptr by sizeof *ptr, in this case sizeof(int).  This saves one addition and one multiplication.

    int i, a[10], *ptr, *end = &a[10];

    for (i=0; i<10; i++)
        a[i] = 0;

    for (i=0; i<10; i++)
        *(a + i) = 0;

    for (ptr = a; ptr < end; ptr++)
        *ptr = 0;

Multi-dimensional Arrays

C has no true multi-dimensional arrays.  The array int a[2][3][4] is really a one-dimensional array rather than a three-dimensional array.

Since the array operator, [] associates left to right, a is an array of 2 of each element: a[i] which is an array of 3 of each element: a[i][j] which is an array of 4 of each element: a[i][j][k] which is an integer.

C stores the elements of an array in row major order, meaning that all columns in the same row are placed beside each other in memory.  To traverse all elements of an array of multiple subscripts in order of memory address, the rightmost subscript must vary through its entire range once for each time that the next-rightmost subscript changes value, etc.

An array of arrays such as type a[rows][cols][dim3] is so arranged in memory that the address of an arbitrary element may be expressed as:

    &a[i][j][k] == &a[0][0][0] + sizeof(type) x ((((i x cols) + j) x dim3) + k)

Within the below array int IntArr[2][4][3], the internal displacement of element IntArr[1][2][2] is: sizeof(int) x ((((1 x 4) + 2) x 3) + 2).
 

Multi-dimensional Array

The elements of IntArr are arranged in memory as follows:
 

Multi-dimensional Array In Memory

Array a can be expressed in terms of pointers rather than subscripts.  Each of expressions: a, a[i], and a[i][j] is an array name that returns a pointer.  Note that expressions a, a[0], and a[0][0] all return pointers containing the address of the first int in the composite array, but the scale of each pointer is different.

    a       == &a[0]         pointer to array of array of int
    a[i]    == &a[i][0]      pointer to array of int
    a[i][j] == &a[i][j][0]   pointer to int

    a[i] == *(&a[0] + i) == *(a + i)

    a[i][j] == *(&a[i][0] + j) == *(a[i] + j)
            == *(*(a + i) + j)

    a[i][j][k] == *(&a[i][j][0] + k) == *(a[i][j] + k)
               == *(*(*(a + i) + j) + k)

The last relation is derived directly, though less conceptually, by noting that a, a[i], and a[i][j] are pointers; and that the [] operator associates left to right:

    a[i][j][k] == ((a[i])[j])[k]
               == ((*(a + i))[j])[k]
               == (*(*(a + i) + j))[k]
               == *(*(*(a + i) + j) + k)

Array Literals

An array literal is used to initialize an array.  Elements are initilized in the order specified in the literal.  If the literal contains fewer entries than the array’s dimension, the final elements will remain uninitialized.

A string literal is compiled as an array of char with an extra final element equal to '\0'.  A string literal as well as an array literal of char may be the initializer of an array of char.  However, only a string literal may be the initializer of a pointer to char.

In the declaration of an array with an initializer, the first dimension is implied and it may be omitted by coding empty brackets [].  This does not apply to higher dimensions.

    int a[4] = {1, 2, 3, 4};

    int aa[3][4] = { {11, 12, 13, 14}, {21, 22, 23, 24}, {31, 32, 33, 34} };

    int aaa[2][3][4] = {
        { {111, 112, 113, 114}, {121, 122, 123, 124}, {131, 132, 133, 134} },
        { {211, 212, 213, 214}, {221, 222, 223, 224}, {231, 232, 233, 234} }
    };

    int b[4] = {1, 2};                 /* Last 2 elements uninitialized  */

    char ca[7] = "string";             /* Implied final \0               */

    char *pc = "string";

 /* char *pc = {'s', 't', 'r', 'i ', 'n', 'g', '\0'};                    */
 /*                                                                      */
 /* Note: This doesn't work.  Only a true string literal does.           */

 /* Implied first dimension                                              */

    char aD1[] = "arr1";
    char aD2[][5] = {"arr1", "arr2", "arr3"};
    char aD3[][3][5] = { {"arr1", "arr2", "arr3"}, {"arr4", "arr5", "arr6"} };

Array of Pointers

The declaration char *a[5] defines an array of pointers to char.  If the first pointer a[0] holds the address of char c, then the array name a is a pointer to the first pointer to char.

    char *a[5];              /* array of 5 pointers to char              */
    char c = 'b';

    a[2] = &c;               /* 3rd element holds address of variable c  */

Pointer to Pointer

The declaration char **ppc defines a single variable ppc that is a pointer to pointer to char.  The object that ppc points to, *ppc, is a pointer to char.  The object that *ppc points to, *(*ppc) or **ppc, is a char.

Access to memory through a single pointer is called a single level of indirection or single indirection.  Access to memory through multiple pointers is called multiple levels of indirection or multiple indirection.

    char c = 'a';             /* char                                    */
    char *pc = &c;            /* pointer to char                         */
    char **ppc = &pc;         /* pointer to pointer to char              */
                              /* *ppc == pc the object ppc points to     */
                              /* **ppc == *pc == c                       */
 /* char **ppc = &&c                                                     */
 /*                                                                      */
 /* Note: This doesn't work.  The & operator applies only to an lvalue.  */
 /*       &&c == &(&c) and &c is not an lvalue.                          */

    char c2 = 'b';            /* another char                            */
    char *pc2 = &c2;          /* another pointer to char                 */

    ppc = &pc2;               /* *ppc == pc2 the object ppc points to    */
    *ppc = pc;                /* *ppc == pc2 = pc                        */
    **ppc = c2;               /* **ppc == *pc2 == *pc == c = c2          */

Such a pointer is often used for traversing an array of pointers and referencing the objects that they address, which are usually strings of varying length.

    char **ppc;               /* pointer to pointer to char              */
    char c;

    char *s1 = "one", *s2 = "two", *s3 = "three";

    char *ap[10];             /* array of 10 pointers to char            */

    ap[0] = s1;
    ap[1] = s2;
    ap[2] = s3;

    ppc = ap;                 /* ppc = &ap[0]                            */

    c = **ppc;                /* c == 'o'                                */
    c = *ppc[0];              /*   All reference the 1st char            */
    c = ppc[0][0];            /*   of the 1st string                     */

    c = *(*(ppc + 2) + 1);    /* c == 'h'                                */
    c = *(ppc[2] + 1);        /*   All reference the 2nd char            */
    c = (*(ppc + 2))[1];      /*   of the 3rd string                     */
    c = ppc[2][1];            /*                                         */

Pointer to Array

The declaration char (*pa)[5] defines a single variable pa that is a pointer to array of 5 char.  The object that pa points to, *pa, is an array of 5 char whose third element is (*pa)[2].  Pointer pa is scaled by sizeof array of 5 char.

    char (*pa)[5];         /* pointer to array[5] of char                */

    char a[5];             /* array[5] of char                           */

    char aaa[3][4][5];     /* array[3] of array[4] of array[5] of char   */
                           /* 12 consecutive array[5] of char in memory  */

    pa = &a;               /* pa = address of whole array a              */

    a[1] = 't';            /* change 2nd char in a to 't'                */
    (*pa)[1] = 'p';        /* change the same char to 'p'                */

    pa = aaa[2];           /* pa = &aaa[2][0]  assign pointer to         */
                           /* 9th consecutive array[5] of char in memory */

    aaa[2][0][1] = 't';    /* change 2nd char in 9th array[5] to 't'     */
    (*pa)[1] = 'p';        /* change the same char to 'p'                */

/* Change 2nd char in 5th thru 8th consecutive array[5] to 'Z'           */

    for (pa = aaa[1]; pa < &aaa[1][4]; pa++)
        (*pa)[1] = 'Z';     

Here is a comprehensive example:

    char **ppc;
                       /* Array of pointers to strings of unequal length */
    char *apc[8] = { "hello, " };                  /* Initialize apc[0]  */
    char *pc = "world\n";

    char aD1f[9] = "charlie\n";                      /* Implied final \0 */
    char aD1s[6] = {'b', 'i', 'n', 'g', ' ', '\0'};
    char aD2[2][6] = { "bang ", {'b', 'o', 'o', 'm', '\n', '\0'} };
    char aD3[2][2][4] = {
        { {' ', 'e', 'n', 'd'}, {' ', 'o',   'f', ' '} },
        { {'s', 't', 'o', 'r'}, {'y', '\n', '\0', ' '} }
    };

    apc[1] = pc;                           /* Print:                     */
    apc[2] = "goodbye, ";                  /*   hello, world             */
    apc[3] = aD1f;                         /*   goodbye, charlie         */
    apc[4] = aD1s;                         /*   bing bang boom           */
    apc[5] = aD2[0];                       /*   end of story             */
    apc[6] = aD2[1];

    apc[7] = aD3[0][0] + 1;                /* apc[7] == &aD3[0][0][1]    */

    ppc = apc;                             /* Use the char** in loop     */ 

    while (ppc < &apc[8])
        printf("%s", *ppc++);              /* Dereference ppc only once  */
                                           /* to get char* for printf    */

Generic Pointer

An explicit cast is required to assign or compare a pointer of one type to a pointer of another type.  A pointer is preserved unchanged when: first, it is assigned to a pointer to a different object type whose requirements for storage alignment are less or equally strict; then, it is assigned back.  Storage alignment is implemention-dependent, but the type char has the least strict requirement for alignment.  Under pre-ANSI C, the pointer to char was used for the generic pointer in type conversions.

ANSI C introduced the pointer to void which requires no explicit cast for assignment or comparison to any other pointer, which occur with no loss of information.  Neither dereferencing nor pointer arithmetic applies to a pointer to void, as it has no scale.  When a function defined with a parameter that is a pointer to any type is called with a corresponding argument that is a pointer to void, the type conversion is performed as in an assignment.

Continuing the above example:

    void *pv;
    void printout(char*);

    apc[7] = (char *) aD3;                 /* apc[7] == &aD3[0][0][0]    */
    printf("%s", ++apc[7]);                /* Print: end of story        */

    pv = apc[2];                           /* Assign char* to void*      */
    apc[7] = pv;                           /* and back                   */
    apc[7][7] = '\n';
    apc[7][8] = '\0';
    printf("%s", apc[7]);                  /* Print: goodbye             */

    pv = "so long\n";                      /* Assign char* to void*      */
    printout(pv);                          /* Print: so long             */

void printout(char *a) { printf("%s", a); }

Null Pointer

To each pointer type there applies a single value designated the null pointer that is distinct from any of its valid values.  The internal representation of the null pointer may differ among pointer types, but the null pointer of one type converts upon casting to the null pointer of any other type.  The null pointer will contribute a zero to a boolean expression, and it will compare equal to the null pointer constant which is an integer expression that evaluates to zero, which optionally may be cast to void *.  The null pointer constant may be coded as the integer contant 0 or as NULL which is a macro defined in the header file stddef.h.

    #include <stddef.h>

    int  i = 1,   *pi = 0;
    char c = 'a', *pc = (char *) pi;      /* Converts null pointer const */

    if (pi == NULL && pc == (void *) 0)   /* Compares equal to NP const  */
        pi = &i;                          /*  This is executed           */ 

    if (!pc && c == 'a')                  /* (1 && 1) == 1               */
        pc = &c;                          /* This is executed            */ 

Const and Pointers

The const type qualifier applies to pointer variables as well as object variables.  A pointer variable may be so defined that the constant property applies to: only its dereferenced value, only its own value, or both values.

A pointer whose dereferenced value is constant may not be assigned to a non-constant pointer.  This guards against an accidental modification of a constant object.  However, a the const qualifier may be overridden by a cast.

    int i = 10;
    int *pi = &i;
    const int ci = 25;                    /* no direct assignment to ci  */

    const int *pci = &ci;                 /* *pci is const,  pci is not  */
    int *const cpi = &i;                  /*  cpi is const, *cpi is not  */
    const int *const cpci = &ci;          /* cpci and *cpci are const    */

    pci = &i;                             /* *pci == i                   */
 /* *pci = 5;                                error - const object        */

    *cpi = 9;                             /* *cpi == i == 9              */
 /* cpi = pci;                               error - const pointer       */

 /* *cpci = 5;                               error - const object        */

 /* cpci = &i;                               error - const pointer       */
 /* cpci = pci;                                                          */

    pci = &ci;                            /* *pci == ci                  */
    pci = pi;                             /* *pci == *pi == i            */

 /* pi = &ci;                                error - assign const        */
 /* pi = pci;                                        to non-const        */

    pi = (int *)&ci;                      /* *pi == ci     dangerous     */
    pi = (int *)pci;                      /* *pi == *pci   dangerous     */

Passing Arrays to Functions

When an array is passed to a function, what is placed on the stack is not the entire array, but only a pointer to the array’s first element.  Therefore, the argument for an array in a function call is the array name alone.  The number of elements must be passed in a separate argument.  In fact, the argument for an array may be a pointer to any of the array’s elements, indicating that the function may receive a shorter array.  Since the function receives a pointer, it may modify the elements indirectly.

The parameter for an array in a function’s definition or prototype may be coded as a pointer, for accuracy, or an array, for clarity.  Since multi-dimensional arrays are really one-dimensional arrays of sub-arrays, when the parameter for an array of n dimensions is a pointer, it must be a pointer to an array of the highest n-1 dimensions, in order to specify the correct scale.

When the parameter is an array, the first dimension is implied and it may be indicated by empty brackets [].  However, all higher dimensions must be specified.

    void func1Da(char [5]);
    void func1Db(char []);
    void func1Dc(char *);
    void func3Da(char [3][2][4]);
    void func3Db(char [][2][4]);
    void func3Dc(char (*)[2][4]);

    char aD1[5] = "char";

    char aD3[3][2][4] = {
        {"big", "bag"},
        {"wig", "wag"},
        {"zig", "zag"},
    };

    func1Da(aD1);                                       /* pass "char"   */
    func1Db(&aD1[1]);                                   /* pass "har"    */
    func1Dc(&aD1[3]);                                   /* pass "r"      */

    func3Da(aD3);                                       /* pass &aD3[0]  */
    func3Db(&aD3[1]);                                   /* pass &aD3[1]  */
    func3Dc(&aD3[2]);                                   /* pass &aD3[2]  */

void func1Da(char a[5]) {printf("%s\n", a);}              /* print: char */
void func1Db(char a[])  {printf("%s\n", a);}              /* print: har  */
void func1Dc(char *a)   {printf("%s\n", a);}              /* print: r    */

void func3Da(char a[3][2][4]) {printf("%s\n", a[0][0]);}   /* print: big */
void func3Db(char a[][2][4])  {printf("%s\n", a[0][1]);}   /* print: wag */
void func3Dc(char (*a)[2][4]) {printf("%s\n", a[0][1]);}   /* print: zag */

Pointer to Function

The declaration char (*pfc)(char*, int) defines variable pfc that is a pointer to a function which takes parameters of char* and int, and returns a char.  Note that, as in a function prototype, it is not necessary to declare parameter names.  When a pointer to a particular function is assigned to a function pointer variable, that function may be called indirectly through the variable.  The function and the variable must agree in number and type of parameters, and in return type.  The expression for the pointer to a given function func(params) is func.  The expression for an indirect function call through functon pointer variable pfc is (*pfc)(params).  This syntax is similar to the array syntax, but with () replacing [] as the dereferencing operator.

Standard C introduced alternative constructions of these expressions.  The alternative expression for the pointer to function func(params) is &func.  The alternative expression for an indirect function call through functon pointer variable pfc is pfc(params).

Unlike object pointers, there is no generic function pointer.  Under some compilers, the pointer to void may be large enough to hold a function pointer with no loss of information, but the standard does not guarantee this.

The elements of an array may consist of pointers to function.

    char  func(char*, int);            /* Function of specific params    */

    char (*pfc)(char*, int);           /* Function pointer, same params  */

    int i = 4;                         /* Variables for params and call  */
    char c, a[] = "array";

    pfc = func;                        /* Assign pointer to variable     */
    c = (*pfc)(a, i);                  /* Indirect call                  */

    pfc = &func;                       /* Alternative syntax             */
    c = pfc(a, i);

In its most common use, a function pointer variable passes a type-specific function to another function that is coded as a generic routine to handle objects of any type. When the generic function is called, it takes the function pointer variable as an argument, and it also takes as arguments pointers to the actual objects it handles, defined as pointer to void.  The function pointer variable’s parameters are also defined as pointer to void.

Before the generic function is called, a pointer to a function that handles objects of a specific type is assigned to the function pointer argument.  The generic function manipulates the objects through the object pointer parameters and it performs type-specific actions by calling the type-specific function indirectly through the function pointer variable parameter.  For example, the generic function would be a sort, and the function pointer variable would convey the routine to compare the objects to be sorted.

    int *maxint(int *, int *);
    void *genfunc(void *, void *, void* (*)(void*, void*));
  
    int num1 = 1, num2 = 2, *pi;

    void *arg1, *arg2;            /* Object pointer variables            */
    void* (*arg3)(void*, void*);  /* Pointer function variable           */

    arg1 = (void*)&num1;          /* Cast object pointers to void*       */  
    arg2 = (void*)&num2;

                                  /* Cast pointer to maxint to type      */
                                  /* of function pointer variable max    */
    arg3 = (void* (*)(void*, void*))maxint;

    pi = (int*)genfunc(arg1, arg2, arg3);

                                  /* Cast all args in place              */
    pi = (int*)genfunc((void*)&num1, (void*)&num2,
                              (void* (*)(void*, void*))maxint);

/* Return pointer to int having max value  */
int *maxint(int *int1, int *int2)
{
    if (*int1 > *int2)
        return int1;
    else
        return int2;
}

/* Return pointer chosen by indirect call to function pointer variable  */
void *genfunc(void *obj1, void *obj2, void* (*max)(void*, void*))
{
    if (obj1 == NULL)
        if (obj2 == NULL)
            return NULL;
        else 
            return obj2;
    else
        if (obj2 == NULL)
            return obj1;
        else 
            return (*max)(obj1, obj2);
}

Copyright © 2012 The Stevens Computing Services Company, Inc.  All rights reserved.