Introduction

This lecture deals with arrays. While arrays are simple, in C they get mixed up with pointers, and this takes some adjustmented of concepts. The use of pointers for arrays is an element of C style. Strings are arrays of chars, terminated in a special way.

Arrays

Definition

Arrays always have a lower bound of zero (there is a reason, to do with pointers). So you only have to state the number of elements: int numbers[20]; char *ch_ptr[30]; declares an array of 20 integers and an array of 30 pointers to characters. The indices of an array are 0 .. (size-1) A typical array walk is for (n = 0; n < 20; n++) numbers[n] = n;

Initialisation

Arrays can be initialised as above. At definition time this can also be done if you know all the values. int numbers[4] = {0, 1, 2, 3}; sets a[0] = 0, etc. The size can be deduced from the initialiser list: int numbers[] = {0,1,2,3};

Sizeof

The size in bytes of anything may be found by using the ``sizeof'' operator. So if ``ch'' is of type char, ``sizeof(ch)'' should be one (byte). The sizeof an array is the total size in bytes for the whole array. The size of an element is the amount of space occupied by one element. So if you forget how many elements the array has, num = sizeof(a)/sizeof(a[0]);

Function parameters

When arrays are used as function parameters, the size is omitted main(int argc, char *argv[]) makes argv an array (of some size) of pointers to chars. /* read a set of numbers and * find the range between * largest and smallest */ #include <stdio.h> #define MAX 100 int spread(int b[], int size); int main(int argc, char *argv[]) { int n; int count = 0; int a[MAX]; while (count < MAX && scanf("%d", &n) != EOF) { a[count++] = n; } printf("spread was %d\n", spread(a, count); } int spread(int b[], int size) { int lo, hi, i; if (size == 0) return 0; lo = hi = b[0]; for (i = 0; i < size; i++) { if (b[i] > hi) hi = b[i]; if (b[i] < lo) lo = b[i]; } return (hi - lo); }

Arrays and pointers

The name of an array - just by itself - is the address of the base of the array

a == &a[0] *a == a[0] In general. a + n == &a[n] *(a + n) == a[n] The ``spread'' function could have been written int spread(int b[], int size) { int lo, hi, i; if (size == 0) return 0; lo = hi = b[0]; for (i = 0; i < size; i++) { if (*(b+i) > hi) hi = *(b+i); if (*(b+i) < lo) lo = *(b+i); } return (hi - lo); When an array is passed as a parameter to a function, its address is passed. This is a pointer value. The function, instead of declaring the parameter as an array could instead have declared it as a pointer.

This is a very common practice. Many library functions declare their arguments as pointers, but you have to pass in an array.

Once a parameter is a pointer, you can do pointer manipulations instead of costly array indexing. Here is another ``spread'':

int spread(int *b, int size) { int hi, lo, i; if (size == 0) return 0; lo = hi = *b; for (i = 0; i < size; i++) { if (*b > hi) hi = *b; if (*b < lo) lo = *b; b++; } return (hi - lo); } This style of coding is very common. Compare these two program fragments int a[] = {1, 2, 3}; void printit(int *a) { int n; for (n = 0; n < 3; n++) { printf("%d\n", *a); a++; } } int main(int arc, char *argv[]) { printit(a); } versus int a[] = {1, 2, 3}; int main(int argc, char *argv[]) { for (n = 0; n < 3; n++) { printf("%d\n", *a); a++; } } The second is wrong, the first is good C style.

Strings

A string is an array of chars, terminated by a null character. These are equivalent: char str[] = "hello"; char str[] = {'h', 'e', 'l', 'l', 'o', '\0'}; You can always count on the null char being at the end of a string (except sometimes). If you create a string you should ensure that you null-terminate it. Otherwise, everything breaks. Because strings end in null, a string walk looks like while (*str != '\0') str++; Here is strlen int strlen(char *str) { int length = 0; while (*str != '\0') { str++; length++; } return length; } Now this is where you get to see some of the special lurks that C has. You can combine the increment into the loop: int strlen(char *str) { int length = 0; while (*str++ != '\0') length++; return length; } Indeed, since 0 = '\0' = False, int strlen(char *str) { int length = 0; while (*str++) length++; return length; } Here is the compact form of strcpy void strcpy(char *from, char *to) { while (*to++ = *from++) ; /* empty body */ } You do eventually get used to this. However, you could always use the more readable versions!

Library functions

Here are some of the library functions that declare their parameters as pointers but actually expect an array: FILE *fopen(char *filename, char *mode) int puts(char *s) char *strcpy(char *s1, char *s2) int atoi(char *nptr) void *bsearch(void *key, void *base, ... Here filename, mode, s, s1, s2, nptr are all strings. base is the address of the array to be binary searched. key, however, is a pointer to an object, not neccessarily an array. Note that the functions sometimes return pointers. These may or may not be arrays. fopen returns a pointer to a structure (record) of type FILE, bsearch returns the address of the element found, whereas strcpy returns the address of the array s1.

Command line arguments

When a C program is compiled and run as a command, the command line parameters are available inside the program by the arguments to the main function int main(int argc, char *argv[]) argc is the number of command-line arguments (including the command). For example, if your C program was called ``ask'', and you called it by ask anybody there Then argc would be 3. The argv array contains 3 pointers to chars, which are 3 string arrays. The values are: argv[0] == "ask" argv[1] == "anybody" argv[2] == "there" A program to print out the command line arguments is #include <stdio.h> int main (int argc, char *argv[]) { int i; printf("Args to command:\n"); for (i = 0; i < argc; i++) { printf("arg %d is %s\n", i, argv[i]); } exit(0); } An equivalent program, using pointers instead of arrays, is #include <stdio.h> int main (int argc, char **argv) { int i; printf("Args to command:\n"); for (i = 0; i < argc; i++, argv++) { printf("arg %d is %s\n", i, *argv); } exit(0); }

Dynamic memory allocation

Programs cannot rely on just local and global data structures. It is often neccessary to create dynamic chunks of memory. The functions to manipulate dynamic memory are malloc and free. #include <stdlib.h> void *malloc(size_t size); void free(void *ptr); The malloc function takes one argument, which is the number of bytes of dynamic storage to allocate. It returns a pointer to the base of this memory. Note that the return type is void *. This means that it is not specified what type the memory is pointing to - you have to specify this.

For example, to create a block of 20 characters

char *pc; pc = (char *) malloc(20); To create a block of 20 integers int *pn; pn = (int *) malloc(20 * sizeof(int)); To free these when no longer needed, free((void *) pc); free((void *) pn);

You use malloc usually to create dynamic structures or dynamic arrays. In the second use, you are returned a pointer to a block of memory. You can treat the pointer as the base of an array (but the pointer can be changed), or just as a pointer

int *pn; pn = (int *) malloc(20); for (n = 0; n < 20; n++) pn[n] = 0; /* this loses the base of the block */ for (n = 0; n < 20; n++) *pn++ = 0;

Example

Here is an array version of strcat char *strcat(char *s1, char *s2) { char *p; int l1 = strlen(s1); int l2 = strlen(l2); int n; p = (char *) malloc(l1 + l2 + 1); for (n = 0; n < l1; n++) p[n] = s1[n]; /* add s2 after s1 */ for (n = 0; n < l2; n++) p[n + l1] = s2[n]; /* null terminate string */ p[l1 + l2] = '\0'; return p; } Here is a pointer version of strcat, char *strcat(char *s1, char *s2) { char *base, *p; int l1 = strlen(s1); int l2 = strlen(s2); base = p = (char *) malloc(l1 + l2 + 1); while (*p++ = *s1++) ; /* empty */ /* continue adding s2 to p */ while (*p++ = *s2++) ; /* empty */ return base; }

Conclusion

An array is a pointer to a fixed address of fixed size. Arrays can be manipulated using common array notation. Pointer variables (such as formal function parameters) can be set to array addresses, allowing C pointer mechanisms to be used to manipulate arrays. This is confusing, but is the key to C programming style, and an understanding of C libraries. Strings are arrays of char, null terminated.

Structures

Structures are the C equivalent of records. A structure type is defined by struct struct-name { type field-name; type field-name; ... } e.g. struct student_type { char name[20]; int ID; } Elements of that type are defined by struct student_type fred, bill, all_students[100]; Because it is tedious to have to remember to use the word ``struct'' in these, the stucture is often ``typedef''-ed to avoid this: typedef struct student_type { char name[20]; int ID; } student_type; student_type fred, bill; You access fields of a structure with the ``.'' notation: fred.ID = 891234; strcpy(fred.name, "fred"); It is common to have pointers to structures. The straightforward notation is clumsy, so a shorthand is available (*student_ptr).ID = ... student_ptr->ID = ... (Note that the student_ptr must be pointing to a valid record!)

Example:

Some functions to manipulate student structures. void print(student_type *s) { printf("Name: %s\n", s->name); printf("ID: %d\n", s->ID); } student_type * read(student_type *s) { int ID; char name[20]; if (scanf("%d %19s", &ID, name) == EOF) return NULL; s->ID = ID; strcpy(s->name, name); return s; }

A sample program is

#define SIZE 100 /* a program to read and print at * most 100 student records */ typedef struct student_type { char name[20]; int ID; } student_type; void print_student(student_type *s) { printf("Name: %s\n", s->name); printf("ID: %d\n", s->ID); } student_type * read_student(student_type *s) { int ID; char name[20]; printf("Enter ID and name\n"); if (scanf("%d %19s", &ID, name) == EOF) return NULL; s->ID = ID; strcpy(s->name, name); return s; } int main(int argc, char *argv[]) { student_type students[SIZE]; int count = 0; int n; while (count < SIZE) { if (read_student(students + count) == NULL) break; count++; } for (n = 0; n < count; n++) print_student(students + n); exit(0); }

Example:

Printing the current date. The standard library has a number of time-related functions. The first is #include <time.h> time_t time(time_t *timer) This returns the current time, in some unspecified format. This can be changed into a known format by functions such as #include <time.h> struct tm *localtime(time_t *timer) The structure tm has fields struct tm { int tm_sec; /* 0..61 */ int tm_min; /* 0..59 */ int tm_hour;/* 0..23 */ int tm_wday; /* 0..6 */ int tm_mon; /* 0..11 */ ... } This allows you access to the localtime. For example int current_day(void) { struct tm *local; time_t t; t = time(NULL); local = localtime(&t); return local->tm_wday; } #include <time.h> int current_day(void) { struct tm *local; time_t t; t = time(NULL); local = localtime(&t); return local->tm_wday; } int main(int argc, char *argv[]) { printf("Today: %d\n", current_day()); exit(0); }

Example:

Structures can be recursive, as in lists or trees.

You need to use pointers inside the data structure. Some dynamic list functions:

typedef struct list { int elmt; struct list *next; } list_elmt, *list_ptr; list_ptr new_elmt(int n) { list_ptr p; p = (list_ptr) malloc( sizeof(list_elmt)); if (p != NULL) { p->elmt = n; p->next = NULL; } return p; } void print_list(list_ptr p) { while (p != NULL) { printf(" %d", p->elmt); p = p->next; } } list_ptr make_list(void) { /* create a list storing 0..9 (or as much of it as possible). */ list_ptr start_list, p; int n; start_list = p = new_elmt(0); if (p == NULL) return NULL; for (n = 1; n < 10; n++) { p->next = new_elmt(n); if (p->next == NULL) break; p = p->next; } return start_list; } typedef struct list { int elmt; struct list *next; } list_elmt, *list_ptr; list_ptr new_elmt(int n) { list_ptr p; p = (list_ptr) malloc( sizeof(list_elmt)); if (p != NULL) { p->elmt = n; p->next = NULL; } return p; } void print_list(list_ptr p) { while (p != NULL) { printf(" %d", p->elmt); p = p->next; } } list_ptr make_list(void) { /* create a list storing 0..9 (or as much of it as possible). */ list_ptr start_list, p; int n; start_list = p = new_elmt(0); if (p == NULL) return NULL; for (n = 1; n < 10; n++) { p->next = new_elmt(n); if (p->next == NULL) break; p = p->next; } return start_list; } int main(int argc, char *argv[]) { list_ptr p; p = make_list(); print_list(p); exit(0); }

Preprocessor

The first stage of compilation is to pass the source through the preprocessor. This expands out certain symbols and produces another C source file (that you do not normally see).

Include files

The statement #include file reads in the contents of the file at that point. These should be specification files, giving details of data-types, function definitions, etc. The filename can either be enclosed in double quotes "..." or in brackets <...> #include "myspec.h" #include <stdio.h> names in quotes normally refer to header files in your current directory, names in brackets refer to files located in a standard place (usually /usr/include on Unix).

Defines

If a piece of text is #define'd, then whenever that piece of text is encountered, the remainder of the line following is substituted for it #define MAX_SIZE 10 #define WARNING \ printf("Warning!!!\n"); if (x == 0) WARNING If the thing being defined has parameters then they act as a macro and parameter subsitution is performed #define SUM(x, y) x + y a = SUM(b, c); Macros are useful, but they can be a source of obscure errors: a = SUM(b, c) * d; becomes a = b + c * d Prevent this (and similar things) by enclosing everything in brackets #define SUM(x, y) ((x) + (y)) Macros that use their arguments more than once can go wrong when used in situations with side-effects: #define islower(x) \ ((ch) >= 'a' && \ (ch) <= 'z') if (islower(getchar())) ...

Conditional compilation

The ifdef construct allows the preprocessor to keep or omit pieces of code. I often have this: #define DEBUG #ifdef DEBUG printf("Reached this bit\n"); #endif

Multiple files

A C program can be across many files.

When a variable or function is declared static, it is not visible outside of its own file. This allows functions to be grouped together as a ``package''. Here is a simple stack package:

#define SIZE 10 static int TOS = 0; static stack[SIZE]; int push(int n) { if (TOS == SIZE - 1) /* full up, return false */ return 0; stack[TOS++] = n; return 1; } int pop(int *n) { if (TOS == 0) return 0; *n = stack[--TOS]; return 1; } For completeness, this should have a specification file ``stack.h'' containing extern int push(int n); extern int pop(int *n); Multiple files can be compiled all at once by placing them all on the command line: gcc -o prog src1.c src2.c ...

Make

There are smarter methods to avoid unneccessary compilations, which avoid having to compile all the source files at once. By hand, a smarter method of compiling three files to make one executable is gcc -c src1.c gcc -c src2.c gcc -c src3.c gcc -o prog src1.o src2.o src3.o When any one of the files changes, only one of the three ``conditional'' compiles has to be repeated, plus the final link compile.

This can be automated using the ``make'' command. This expects a file ``Makefile'' which contains dependency instructions. These are of the form

file : files it depends upon <tab> instructions to bring it up to date For example OBJS = src1.o src2.o src3.o CFLAGS = -g src1.o : src1.c gcc -c $(CFLAGS) src1.c src2.o : src2.c gcc -c $(CFLAGS) src2.c src3.o : src3.c gcc -c $(CFLAGS) src3.c prog : $(OBJS) gcc -o prog $(CFLAGS) $(OBJS) Then whenever you make a change, running ``make'' automatically figures out which commands to run.

make has inbuilt rules about many things, including how to compile C files. The above can be abbreviated to

OBJS = src1.o src2.o src3.o CFLAGS = -g prog : $(OBJS) gcc -o prog $(CFLAGS) $(OBJS)

System doco

UNAME(2V)  SYSTEM CALLS    UNAME(2V)

NAME
uname - get information about current system

SYNOPSIS
     #include 

     int uname (name)
     struct utsname *name;

DESCRIPTION
uname() stores information identifying
the current operating system in the 
structure pointed to by name.

uname() uses the structure defined in 
,  the members of which 
are:

          struct utsname {
               char sysname[9];
               char nodename[9];
               char nodeext[65-9];
               char release[9];
               char version[9];
               char machine[9];
          }
uname() places a null-terminated character
string naming the current  operating 
system  in  the character array sysname;
this string is SunOS on Sun systems. 

nodename is set to the name  that  the 
system is known by on a communications 
network; this is the same value  as  is 
returned  by  gethostname(2).  release 
and version are set to values that further
identify the operating system.  

machine is set to a standard name  that
identifies the hardware on which the 
SunOS system is running. This is the same 
as  the  value  displayed  by arch(1).

RETURN VALUES
uname() returns:

     0    on success.
     -1   on failure.

SEE ALSO
arch(1), uname(1), gethostname(2)

This doco defines the header file to use and the calling syntax of the function (note that it uses ``old style'' C syntax in which the parameter types are listed after the function). The description shows what the structure is. If you aren't told it, then you probably don't need to know it. The return values are shown generally indicating success or fail. The See Also points you to other relevant functions. From this we can write #include <sys/utsname.h> int main(int argc, char *argv[]) { struct utsname info; if (uname(&info) == -1) { fprintf(stderr, "no name??\n"); exit(1); } printf("sys name: %s\n", info.sysname); exit(0); }

Advanced stuff - function pointers

A function starts at an address in memory. When it is called, execution jumps to that address and executes code from that point. So in C, a function is an address, and calling the function is a jump to that address.

You can store the address of a function in a function pointer variable, and call it by dereferencing that address


int f();
int (*fp)();

fp = f;     /* assign address of f to fp */
(*fp) ();   /* call the function pointed to */

C and objects

C is not an O/O language. It doesn't have classes, instances of classes (objects) or methods. But you can fake them by using structures for classes and function pointers for methods


typedef struct person {
    int age;
    int (*getAge) (person *);
    void (*setAge) (person *, int);
} person;

int getAge(person *p) {
    return p->age;
}

int setAge(person *p, int age) {
    p-≶age = age;
}

person p;
p.setAge = setAge;
p.getAge = getAge;

(*(p.setAge)) (&p, 20);

Faking inheritance

Inheritance can be faked by building up structures that contain parts for each bit of the inheritance chain. For example, to build a chain of employee inheriting from person, the following could be done


typedef personPart {
    int age;
} personPart

typedef person {
    personPart p;
} person;

typedef employeePart {
    char *job;
} employeePart;

typedef employee {
   personPart p;
   employeepart emp;
} employee;

Then you can access new fields and inherited fields of employee by


employee em;
em.p.age = 20;
em.emp.job = "clerk";

email: jan@newmarch.name
Web: http://jan.newmarch.name/

Last modified: 9 April, 2001