Introduction

This lecture looks at the overall structure of a C program, and then does a ``bottom up'' approach to the language. It looks at the basic data types and the common operations on them. It is important to note both the common features with other languages and the differences.

The C language

The Language List (sometimes distributed on the InterNet) shows that over 5000 programming languages have been invented. Why look at this one?

The first version of Unix was written in PDP-11 Assembler. The C language was written to make it easier to manage Unix source code. It was designed as a language to write Operating Systems. IBM's OS/2 was originally written largely in Assembler, but is now mainly in C. Windows NT from Microsoft is mainly written in C. The language is very successful for writing Operating Systems.

So why would application programmers use it? The Application Programmers Interfaces (API) to Unix, OS/2 and Windows NT define a C language interface to the OS. If you want to open a file for reading, there is a C routine to do it. If you want to create a new process, there is a C routine to do it. So if you want to write a program that makes use of OS services, it is easiest to access these services through C.

There are a large number of free and commercial libraries written in C, such as Microsoft Windows, the X Window system, maths packages, database routines, communications systems etc.

Despite its popularity, C has many unpleasant features. It is very easy to write programs that do not work (called ``shooting yourself in the foot'').

Structure of C programs

A simple C program consists of one file. To create an executable from this, it goes through a number of steps, some involving other files.

More complex programs may

Example: A program to count the number of characters read from standard input. main(int argc, char *argv[]) { int ch; int count; count = 0; while ((ch = getchar()) != EOF) { count++; } printf("Chars read %d\n", count); exit(0); } The components of this are:

Scalar data types

integers

Integers may be short, ordinary length (default) or long; signed (default) or unsigned int i, j, k; unsigned short l; The only constraint is that sizeof(short int) <= sizeof(int) <= sizeof(long int) int main(int argc, char *argv[]) { printf("char size (bytes): %d\n", sizeof(char)); printf("short int size: %d\n", sizeof(short int)); printf("int size: %d\n", sizeof(int)); printf("long int size: %d\n", sizeof(long int)); printf("int pointer size: %d\n", sizeof(int *)); exit(0); } Ordinary decimal notation may be used, or octal (prefixed by 0), or hex (prefixed by 0x). i = 16; /* decimal */ j = 020; /* octal */ k = 0x10 /* hex */

Floating point

Floating point numbers come in two sizes: float and double.

Characters

Type char is a subset of int. It is unspecified as to what subset. However, char + EOF is a subset of int
This often causes confusion about whether to declare variables as char or int.

Characters are generally enclosed in single quotes. Special characters such as newline have an ``escape'' representation

ch1 = 'A'; ch2 = 'a'; ch3 = '\n'; /* newline */ ch4 = '\t'; /* tab */

Pointers

Given any type (integer, char, structure, etc), you can have a pointer to that type. A pointer is the address of data. short int *short_ptr; char *char_ptr;

The address of a variable is found by using the `&' operator. To dereference a pointer (find the value at the address) use the `*' operator int main(int argc, char *argv[]) { char ch, *cptr; ch = 'a'; cptr = &ch; /* cptr holds address of a */ putchar(*cptr); /* print 'a' */ *cptr = 'b'; putchar(ch); /* print 'b' */ exit(0); } You do sometimes get indirect addressing int n; int *int_ptr; int **int_ptr_ptr; int_ptr = &n; int_ptr_ptr = &int_ptr; **int_ptr_ptr = 100; gives

No-one seems to be able to reasonably handle further levels of indirection than that.

Enumerated types

Define enumerated types by the `enum' keyword enum colour {red, green, amber}; and variables of these types by enum colour traffic_light; Use them as normal traffic_light = red; if (traffic_light == red) ... enum colour {red, green, amber}; int main(int argc, char *argv[]) { enum colour traffic_light; traffic_light = red; if (traffic_light == red) printf("light was red\n"); }

Synonyms for types

Create new names for types by `typedef' typedef long int * big_ptr_t; and then declare variables by big_ptr_t p;

Boolean

There is no type Boolean. The integer 0 is Boolean False, and any other integer is True. (NB: Not the same way round as the shells.)

This infinite loop program prints lots of "true", but when the char wraps around to zero, prints "false":

int main(int argc, char *argv[]) { unsigned char n = 0; while (1) { if (n) printf("true: %d\n", n); else printf("this one is false: %d\n", n); n += 16; /* increment n by 16 */ sleep(1); } }

void

void has no type. It is used as a generic pointer type when you need one, or for functions that return no value.
e.g. sort an array of something - integers, chars, strings, etc, without specifying exactly what is being sorted.

Expressions

Everything in C is an expression. x + y x = y The value of x+y is the sum. The value of x=y is x.

Expressions may be nested

x + y + z x = y = z You may need to know the associativity rules for such expressions. Is 8 / 4 / 2 (8/4)/2 or 8/(4/2)?

Equality associates right to left:

(x = y = z) == (x = (y = z)) There are the usual arithmetic and relational operators. NB: the Boolean equality test is `==': if (x == y) ... Be suspicious of if (x = y) ... It is legal, and may be correct (the value of x=y is x, and if this is integral then it has a Boolean value), but is probably a semantic error.

The inequality operator is `!='.

A huge amount of code combines an assignment statement (typically through a function call) with a Boolean test:

if ((ch = getchar()) != EOF) ... Note the brackets to ensure the assignment is done before the inequality. int main(int argc, char *argv[]) { int n = 2; if ((n = 2) != 0) printf("n not 0\n"); if (n = 2 != 0) printf("n is %d\n", n); exit(0); }

The logical operators are
&& - cand
|| - cor
!= - not equal
The operators are ``short circuit'' ones, so the following executes correctly

if ( x != 0 && y/x > 0) ...

Special operators

There are special increment and decrement operators:
n++
value is n.
increment n by one after
n--
value is n.
decrement n by one after
++n
increment n by one before.
value is new n
--n
decrement n by one before.
value is new n
And compact forms of assignment n += 1 /*n = n + 1*/ n -= 3 /*n = n - 3*/ The relations between different types of operators is very important *p++ == *(p++) means ``value of p++ is p, then increment (pointer) p by one. value of *p++ is value pointed to by p, then p moves to next address.'' (*p)++ is less common, but means ``value of what p is pointing to is incremented by one.''

Example

read in a list of characters until end of file, and sum all the digit characters. int ch; int sum = 0; int chars_read = 0; int digits_read = 0; int main(int argc,char *argv[]) { while ((ch = getchar()) != EOF) { chars_read++; if (ch >= '0' && ch <= '9') { digits_read++; sum += ch - '0'; } } printf("sum %d\n\ chars read %d\n\ digits read %d\n", sum, chars_read, digits_read); exit(0); }

Summary

This lecture has looked at the overall structure of a C program. It examined the basic data types of C, and looked at common operations on them. Several things are common with other languages such as Java, but there are differences of both a syntactic and semantic nature. You need to pay attention to these.

Confusion about int or char

If you want to declare a variable that should hold chars, should it be of type char or int? The function getchar() returns chars, so why is it declared to return int?

If the value is char+EOF then it should be declared of type int, because that is a superset of char+EOF.

If the value can never be EOF, then it should be of type char.

The function getchar() can return EOF. So it must be of type int. If a variable is assigned the value from getchar() then it can take EOF, so it must be of type int also. However, if it is not EOF, then type char is ok:

int ich; if ((ich = getchar()) != EOF) { /* cant be EOF now */ char ch = (char) ich; ... }

Introduction

This part deals with the control flow constructs of if..., while... and for... . It looks at the standard library functions that you can use at any time. It then deals with how to define your own functions, and looks at parameter passing mechanisms.

Control flow

C uses the semicolon to terminate statements (like Java, unlike Pascal). Statements may be grouped in {...}, just like BEGIN...END.

if

The if statement has the form if (expression) statement Note the brackets around the expression. The expression is anything that can be evaluated to a Boolean value of 0 (false) or other number (true). In particular, it may contain executable functions, assignments, etc. if ((ch = getchar()) != EOF) ... The if..then..else form is if (expr) statement else statement as in if (x == 1) x++; else x--;

while

The while loop has syntax while (expression) statement Two keywords inside a loop are ``break'' and ``continue''. break terminates the loop. continue ceases execution of the current pass through the loop and returns to the loop condition. /* count the number of even chars and odd chars */ while (1) { if ((ch = getchar()) == EOF) break; if (ch % 2 == 0) { evens++; continue; } odds++; } Any statement may be empty. This can lead to syntactically correct but erroneous code. This is in fact correct, to copy stdin to stdout: while ((ch = getchar()) != EOF && putchar(ch) != EOF) ; /* empty body */ int main(int argc, char *argv[]) { int ch; while ((ch = getchar()) != EOF && putchar(ch) != EOF) ; /* empty body */ exit(0); }

for

The for loop is the most general loop construct for (initial; continue; increment) statement where initial, continue and increment can be any expressions (including empty). for (i = 0; i < 20; i++) ... for (ch = getchar(); ch != EOF; ch = getchar()) ... int main(int argc, char *argv[]) { int ch; /* copy stdin to stdout */ for (ch = getchar(); ch != EOF; ch = getchar()) putchar(ch); exit(0); } /* forever */ for (;;) ...

case

The case statement is of the form switch (expression) { case const: ... case const: ... default: ... } The constants can be any integer values (including enumerated values). NB: each branch should be terminated with ``break'', or it will ``fall'' into the next branch. int main(int argc, char *argv[]) { int ch; int vowel_count = 0, other_char_count = 0; while ((ch = getchar()) != EOF) switch (ch) { case 'a': case 'e': case 'i': case 'o': case 'u': vowel_count++; break; default: other_char_count++; } printf("Vowels: %d; Other: %d\n", vowel_count, other_char_count); exit(0); }

Standard library functions

C has a large standard library of functions covering areas of:

putchar

#include <stdio.h> int putchar(int) putchar prints the character argument to standard output. The value of the function is the character printed if it was successful, or EOF if it was not /* no error check */ putchar('X'); /* with error check */ if (putchar('X') == EOF) /* problem with output device? */

getchar

#include <stdio.h> int getchar(void) Read a character from standard input. Return a char if successful, EOF if not. Note that the return must in fact be an int, because EOF is not a char. int ch; ch = getchar(); if (ch == EOF) ...

printf

int printf(char *format, args-list...) Formatted print statement to standard output. ``format'' is a string that is printed after substitutions using the ``args-list'' are made. Special codes are used: The actual values come from the list: n = 20; printf("%d in octal is %o\n", n, n); int main(int argc, char *argv[]) { int n; char line[128]; puts("Enter a number in decimal:"); while (scanf("%d", &n) != EOF) { printf("%d in octal is %o\n", n, n); puts("Enter a number in decimal:"); } exit(0); } Escaped characters can be used in these strings such as
\n new line
\t tab

isalpha

#include <ctype.h> int isalpha(int) Gives a Boolean value saying whether the character is alphabetic if (isalpha(c)) alpha_count++;

Functions

C has no procedures, only functions. By returning no value, a function acts like a procedure. By ignoring the function return value you treat a function as though it was a procedure.

Inside a function the ``return'' statement immediately terminates the function. The value (if any) is the value following the ``return'' keyword.

int sum(int m, int n) { return m+n; } There can be any number of return statements int isalpha(int ch) { if ('a' <= ch && ch <= 'z') return 1; if ('A' <= ch && ch <= 'Z') return 1; return 0; } Function definitions may not be nested. Functions with no return value are declared as type ``void''. Functions with no arguments have the argument list declared as ``void''. (Note: this use of void is not the same as the data type void.) void hello_message(void) { printf("Hello there\n"); } Function parameters are all value parameters (Java parameters, Ada IN parameters, Pascal value parameters). There are no OUT (Ada) or VAR (Pascal) parameters. The normal way of changing a value is by using the function value ch = tolower(ch); To change the value of a parameter you have to do your own ``call by reference'' and pass the address of the variable as the parameter. Example: The sum of two ints as a procedure rather than a function: void sumof(int x, int y, int *z) { *z = x + y; } and use it by int n; sumof(1, 2, &n); void sumof(int x, int y, int *z) { *z = x + y; } int main(int argc, char *argv[]) { int n; sumof(1, 2, &n); printf("1+2 = %d\n", n); exit(0); } There is a very high potential for error in reference parameters. In the program, change int n to int *n , &n to n and printf(.., n) to printf(.., *n) and see if you get strange results. If not, you fluked out lucky!

Explanation

In order to change the value of a variable, you must pass a pointer to the address of that variable. So the function declaration needs a pointer (here int *).

When you call the function, you must pass an address. The assumption made by the function is that the address is of the right type. If it isn't, then garbage happens. Here, int *n is a variable with unassigned contents. It probably points to a kinky, garbage, address. Maybe you can write to that address, maybe not.

Make it really fail by

int *n = NULL; since you can't write to the NULL address.

Where a function declares a pointer parameter, check (from the doco) to see if the parameter must be the address of existing memory of that type. Often the doco is vague on this.


Example

From the standard library, the function ``frexp'' splits a real number into a fraction and an integer power of two, as in
24 = 0.75 x 2^5

The specification for this function is

double frexp(double value, int *exp) The description says that the function return is the value of the fraction and the power of two is returned in the int pointed to by ``exp''. That means you have to supply an integer, and pass its address to the function, as in int power; long fraction, number; number = 24.0; fraction = frexp(number, &power);

Example:

The ``scanf'' function is used for formatted input just like ``printf'' is used for formatted output. It uses the same substitution sequences ``%d'' for an integer, ``%c'' for a character, etc. However, when it reads a value, it stores it in a variable. The call by value semantics means that it must be given the address of a variable. To read a character: int ch; scanf("%c", &ch); To read an integer: int n; scanf("%d", &n); Note that this is wrong, just like before: int *n; scanf("%d", n);

Example:

The following program reads sets of 3 integers, stopping on end-of-file. For each set it prints them out in descending order. #include <stdio.h> void swap( int *m, int *n) { int k; k = *m; *m = *n; *n = k; } int main(int argc, char *argv[]) { int a, b, c; while(scanf("%d %d %d", &a, &b, &c) != EOF) { if (b < c) swap(&b, &c); if (a < b) swap(&a, &b); if (b < c) swap(&b, &c); printf("%d %d %d", a, b, c); } exit(0); }

Summary

This lecture dealt with the control flow constructs of if..., while... and for... . It looked at the standard library functions of C. It then dealt with how to define your own functions, and looked at parameter passing mechanisms. Some very common problems with addressing were considered.
jan@newmarch.name
Web: http://jan.newmarch.name/
Last modified: 3 April, 2001