Introduction
This lecture looks at the overall structure of a C program, and then does
a ``bottom up'' approach to the language. It looks at the basic data types
and the common operations on them. It is important to note both the
common features with other languages and the differences.
The C language
The Language List (sometimes distributed on the InterNet) shows that over 5000
programming languages have been invented. Why look at this one?
The first version of Unix was written in PDP-11 Assembler. The C language was
written to make it easier to manage Unix source code. It was designed as a
language to write Operating Systems. IBM's OS/2 was originally written largely
in Assembler, but is now mainly in C. Windows NT from Microsoft is mainly written
in C. The language is very successful for writing Operating Systems.
So why would application programmers use it? The Application Programmers Interfaces
(API) to Unix, OS/2 and Windows NT define a C language interface to the OS.
If you want to open a file for reading, there is a C routine to do it. If you
want to create a new process, there is a C routine to do it. So if you want
to write a program that makes use of OS services, it is easiest to access these
services through C.
There are a large number of free and commercial libraries written in C, such
as Microsoft Windows, the X Window system, maths packages, database routines,
communications systems etc.
Despite its popularity, C has many unpleasant features. It is very easy to
write programs that do not work (called
``shooting yourself in the foot'').
Structure of C programs
A simple C program consists of one file. To create an executable from this,
it goes through a number of steps, some involving other files.
More complex programs may
-
Use more source files
-
Use more specification headers
-
Use other libraries.
Example:
A program to count the number of characters read from standard input.
main(int argc, char *argv[])
{
int ch;
int count;
count = 0;
while ((ch = getchar()) != EOF)
{
count++;
}
printf("Chars read %d\n", count);
exit(0);
}
The components of this are:
Scalar data types
integers
Integers may be short, ordinary length (default) or long; signed (default)
or unsigned
int i, j, k;
unsigned short l;
The only constraint is that
sizeof(short int) <=
sizeof(int) <=
sizeof(long int)
int main(int argc, char *argv[])
{
printf("char size (bytes): %d\n",
sizeof(char));
printf("short int size: %d\n",
sizeof(short int));
printf("int size: %d\n",
sizeof(int));
printf("long int size: %d\n",
sizeof(long int));
printf("int pointer size: %d\n",
sizeof(int *));
exit(0);
}
Ordinary decimal notation may be used, or octal (prefixed by 0),
or hex (prefixed by 0x).
i = 16; /* decimal */
j = 020; /* octal */
k = 0x10 /* hex */
Floating point
Floating point numbers come in two sizes: float and double.
Characters
Type char is a subset of int. It is unspecified as to what subset. However,
char + EOF
is a subset of int
This often causes
confusion
about whether to declare variables as char or int.
Characters are generally enclosed in single quotes. Special characters such
as newline have an ``escape'' representation
ch1 = 'A';
ch2 = 'a';
ch3 = '\n'; /* newline */
ch4 = '\t'; /* tab */
Pointers
Given any type (integer, char, structure, etc), you can have a pointer to that
type. A pointer is the address of data.
short int *short_ptr;
char *char_ptr;
The address of a variable is found by using the `&' operator. To dereference
a pointer (find the value at the address) use the `*' operator
int main(int argc, char *argv[])
{
char ch, *cptr;
ch = 'a';
cptr = &ch; /* cptr holds
address of a */
putchar(*cptr); /* print 'a' */
*cptr = 'b';
putchar(ch); /* print 'b' */
exit(0);
}
You do sometimes get indirect addressing
int n;
int *int_ptr;
int **int_ptr_ptr;
int_ptr = &n;
int_ptr_ptr = &int_ptr;
**int_ptr_ptr = 100;
gives
No-one seems to be able to reasonably handle further levels of indirection
than that.
Enumerated types
Define enumerated types by the `enum' keyword
enum colour {red, green, amber};
and variables of these types by
enum colour traffic_light;
Use them as normal
traffic_light = red;
if (traffic_light == red) ...
enum colour {red, green, amber};
int main(int argc, char *argv[])
{
enum colour traffic_light;
traffic_light = red;
if (traffic_light == red)
printf("light was red\n");
}
Synonyms for types
Create new names for types by `typedef'
typedef long int * big_ptr_t;
and then declare variables by
big_ptr_t p;
Boolean
There is no type Boolean.
The integer 0 is Boolean False, and any other integer
is True. (NB: Not the same way round as the shells.)
This infinite loop program prints lots of "true",
but when the char wraps around to zero,
prints "false":
int main(int argc, char *argv[])
{
unsigned char n = 0;
while (1)
{
if (n)
printf("true: %d\n", n);
else
printf("this one is false: %d\n", n);
n += 16; /* increment n by 16 */
sleep(1);
}
}
void
void has no type. It is used as a generic pointer type when you need one, or
for functions that return no value.
e.g. sort an array of something - integers, chars, strings, etc, without
specifying exactly what is being sorted.
Expressions
Everything in C is an expression.
x + y
x = y
The value of x+y is the sum. The value of x=y is x.
Expressions may be nested
x + y + z
x = y = z
You may need to know the associativity rules for such expressions. Is
8 / 4 / 2
(8/4)/2 or 8/(4/2)?
Equality associates right to left:
(x = y = z) == (x = (y = z))
There are the usual arithmetic and relational operators.
NB: the Boolean equality
test is `==':
if (x == y) ...
Be suspicious of
if (x = y) ...
It is legal, and may be correct (the value of x=y is x, and if this is
integral then it has a Boolean value), but is probably a semantic error.
The inequality operator is `!='.
A huge amount of code combines an assignment statement (typically
through a function call) with a Boolean test:
if ((ch = getchar()) != EOF) ...
Note the brackets to ensure the assignment is done before the inequality.
int main(int argc, char *argv[])
{
int n = 2;
if ((n = 2) != 0)
printf("n not 0\n");
if (n = 2 != 0)
printf("n is %d\n", n);
exit(0);
}
The logical operators are
&&
- cand
||
- cor
!=
- not equal
The operators are ``short circuit'' ones, so the following executes correctly
if ( x != 0 && y/x > 0) ...
Special operators
There are special increment and decrement operators:
-
n++
- value is n.
increment n by one after
-
n--
- value is n.
decrement n by one after
-
++n
- increment n by one before.
value is new n
-
--n
- decrement n by one before.
value is new n
And compact forms of assignment
n += 1 /*n = n + 1*/
n -= 3 /*n = n - 3*/
The relations between different types of operators is very important
*p++ == *(p++)
means ``value of p++ is p, then increment (pointer) p by one. value of *p++
is value pointed to by p, then p moves to next address.''
(*p)++
is less common, but means ``value of what p is pointing to is incremented by
one.''
Example
read in a list of characters until end of file, and sum all the digit characters.
int ch;
int sum = 0;
int chars_read = 0;
int digits_read = 0;
int main(int argc,char *argv[])
{
while ((ch = getchar()) != EOF)
{
chars_read++;
if (ch >= '0' && ch <= '9')
{
digits_read++;
sum += ch - '0';
}
}
printf("sum %d\n\
chars read %d\n\
digits read %d\n",
sum, chars_read,
digits_read);
exit(0);
}
Summary
This lecture has looked at the overall structure of a C program.
It examined the basic data types of C, and looked at common operations
on them. Several things are common with other languages such as Java,
but there are differences of both a syntactic and semantic nature.
You need to pay attention to these.
If you want to declare a variable that should hold chars, should it be
of type char
or int
?
The function getchar()
returns chars, so why is it
declared to return int
?
If the value is char+EOF
then it should be declared of type
int
,
because that is a superset of char+EOF
.
If the value can never be EOF
, then it should
be of type char
.
The function getchar()
can return EOF. So it must
be of type int
. If a variable is assigned the
value from getchar()
then it can take EOF, so it must be
of type int
also.
However, if it is not EOF, then type char
is ok:
int ich;
if ((ich = getchar()) != EOF)
{ /* cant be EOF now */
char ch = (char) ich;
...
}
Introduction
This part deals with the control flow constructs of if..., while...
and for... . It looks at the standard library functions that you can
use at any time. It then deals with how to define your own functions,
and looks at parameter passing mechanisms.
Control flow
C uses the semicolon to terminate statements (like Java, unlike Pascal).
Statements
may be grouped in {...}, just like BEGIN...END.
if
The if statement has the form
if (expression) statement
Note the brackets around the expression. The expression is anything that can
be evaluated to a Boolean value of 0 (false) or other number (true).
In particular,
it may contain executable functions, assignments, etc.
if ((ch = getchar()) != EOF) ...
The if..then..else form is
if (expr)
statement
else
statement
as in
if (x == 1)
x++;
else
x--;
while
The while loop has syntax
while (expression) statement
Two keywords inside a loop are ``break'' and ``continue''. break
terminates the loop. continue ceases execution of the
current pass through the loop and returns to the loop condition.
/* count the number of even chars
and odd chars
*/
while (1)
{
if ((ch = getchar()) == EOF)
break;
if (ch % 2 == 0) {
evens++;
continue;
}
odds++;
}
Any statement may be empty. This can lead to syntactically correct but erroneous
code. This is in fact correct, to copy stdin to stdout:
while ((ch = getchar()) != EOF &&
putchar(ch) != EOF)
; /* empty body */
int main(int argc, char *argv[])
{ int ch;
while ((ch = getchar()) != EOF &&
putchar(ch) != EOF)
; /* empty body */
exit(0);
}
for
The for loop is the most general loop construct
for (initial; continue; increment)
statement
where initial, continue and increment can be any expressions (including empty).
for (i = 0; i < 20; i++) ...
for (ch = getchar(); ch != EOF;
ch = getchar())
...
int main(int argc, char *argv[])
{ int ch;
/* copy stdin to stdout */
for (ch = getchar(); ch != EOF;
ch = getchar())
putchar(ch);
exit(0);
}
/* forever */
for (;;) ...
case
The case statement is of the form
switch (expression) {
case const: ...
case const: ...
default: ...
}
The constants can be any integer values (including enumerated values). NB:
each branch should be terminated with ``break'', or it will ``fall'' into the
next branch.
int main(int argc, char *argv[])
{ int ch;
int vowel_count = 0,
other_char_count = 0;
while ((ch = getchar()) != EOF)
switch (ch)
{
case 'a':
case 'e':
case 'i':
case 'o':
case 'u': vowel_count++;
break;
default: other_char_count++;
}
printf("Vowels: %d; Other: %d\n",
vowel_count, other_char_count);
exit(0);
}
Standard library functions
C has a large standard library of functions covering areas of:
-
I/O
- File opening:
fopen(), fclose(), freopen(), fflush()
.
- Input a char:
getchar(), getc(), fgetch(), ungetc()
.
- Input a string:
gets(), fgets()
.
- Formatted input:
scanf(), fscanf(), sscanf()
.
- Output a char:
putchar(), putc(), fputc()
.
- Output a string:
puts(), fputs()
.
- Formatted print:
printf(), fprintf(), sprintf()
.
- Block I/O:
fread(), fwrite()
.
-
Character types
-
isalpha(), isalnum(), isdigit()
.
-
isascii(), iscntrl()
.
-
isspace(), ispunct(), iswhite()
.
-
islower(), isupper()
.
-
toupper(), tolower()
.
-
Strings
- Copy:
strcpy(), strncpy()
.
- Compare:
strcmp(), strncmp()
.
- Length:
strlen()
.
- Concat:
strcat(), strncat()
.
- Convert to int:
strtod(), strtol(), strtoul()
.
- Locate char:
strchr(), strpos(), strrchr(), strrpos()
.
- Locate chars:
strspn(), strcspn(), strpbrk(), strrpbrk()
.
- Substrings:
strstr(), strtok()
.
-
Date and time
- Elapsed time:
clock()
.
- Date and time:
gmtime(), localtime()
.
- Ascii time:
acstime(), ctime()
.
-
Maths
-
Memory functions
- Create dynamic memory:
malloc(), calloc()
.
- Free dynamic memory:
free()
.
- Reallocate memory:
realloc()
.
-
Miscellaneous functions
- Get environment:
getenv()
.
- Get login info:
getlogin()
.
- Get current directory:
getcwd()
.
- Binary search:
bsearch()
.
,li> Quick sort: qsort()
.
putchar
#include
int putchar(int)
putchar prints the character argument to standard output. The value of the
function is the character printed if it was successful, or EOF if it was not
/* no error check */
putchar('X');
/* with error check */
if (putchar('X') == EOF)
/* problem with
output device? */
getchar
#include
int getchar(void)
Read a character from standard input. Return a char if successful, EOF if not.
Note that the return must in fact be an int, because EOF is not a char.
int ch;
ch = getchar();
if (ch == EOF) ...
printf
int printf(char *format,
args-list...)
Formatted print statement to standard output. ``format'' is a string that is
printed after substitutions using the ``args-list'' are made. Special codes
are used:
-
%c single char
-
%s string of chars
-
%d decimal integer
-
%o octal integer
-
%h hexadecimal integer
The actual values come from the list:
n = 20;
printf("%d in octal is %o\n", n, n);
int main(int argc, char *argv[])
{ int n;
char line[128];
puts("Enter a number in decimal:");
while (scanf("%d", &n) != EOF)
{
printf("%d in octal is %o\n",
n, n);
puts("Enter a number in decimal:");
}
exit(0);
}
Escaped characters can be used in these strings such as
\n new line
\t tab
isalpha
#include
int isalpha(int)
Gives a Boolean value saying whether the character is alphabetic
if (isalpha(c))
alpha_count++;
Functions
C has no procedures, only functions. By returning no value, a function acts
like a procedure. By ignoring the function return value you treat a function
as though it was a procedure.
Inside a function the ``return'' statement immediately
terminates the function.
The value (if any) is the value following the ``return''
keyword.
int sum(int m, int n)
{
return m+n;
}
There can be any number of return statements
int isalpha(int ch)
{
if ('a' <= ch && ch <= 'z')
return 1;
if ('A' <= ch && ch <= 'Z')
return 1;
return 0;
}
Function definitions may not be nested. Functions with no return value are
declared as type ``void''. Functions with no arguments have the argument list
declared as ``void''. (Note: this use of void is not the same as the
data type void.)
void hello_message(void)
{
printf("Hello there\n");
}
Function parameters are all value parameters (Java parameters,
Ada IN parameters, Pascal value
parameters). There are no OUT (Ada) or VAR (Pascal) parameters. The normal
way of changing a value is by using the function value
ch = tolower(ch);
To change the value of a parameter you have to do your own
``call by reference''
and pass the address of the variable as the parameter.
Example:
The sum of two ints as a procedure rather than a function:
void sumof(int x, int y, int *z)
{
*z = x + y;
}
and use it by
int n;
sumof(1, 2, &n);
void sumof(int x, int y, int *z)
{
*z = x + y;
}
int main(int argc, char *argv[])
{ int n;
sumof(1, 2, &n);
printf("1+2 = %d\n", n);
exit(0);
}
There is a very high potential for error in reference parameters.
In the program, change int n to int *n , &n to n
and printf(.., n) to printf(.., *n)
and see if you get strange results. If not, you fluked out lucky!
Explanation
In order to change the value of a variable, you must pass a pointer to
the address of that variable. So the function declaration needs a pointer
(here int *
).
When you call the function, you must pass an address. The assumption made
by the function is that the address is of the right type. If it isn't,
then garbage happens. Here, int *n
is a
variable with unassigned contents.
It probably points to a kinky, garbage, address. Maybe you can write to that
address, maybe not.
Make it really fail by
int *n = NULL;
since you can't write to the NULL address.
Where a function declares a pointer parameter, check (from the doco)
to see if the
parameter must be the address of existing memory of that type.
Often the doco is vague on this.
Example
From the standard library, the function ``frexp'' splits a real number into
a fraction and an integer power of two, as in
24 = 0.75 x 2^5
The specification for this function is
double frexp(double value, int *exp)
The description says that the function return is the value of the fraction
and the power of two is returned in the int pointed to by ``exp''. That means
you have to supply an integer, and pass its address to the function, as in
int power;
long fraction, number;
number = 24.0;
fraction = frexp(number, &power);
Example:
The ``scanf'' function is used for formatted input just like ``printf'' is
used for formatted output. It uses the same substitution sequences ``%d'' for
an integer, ``%c'' for a character, etc. However, when it reads a value, it
stores it in a variable. The call by value semantics means that it must be
given the address of a variable. To read a character:
int ch;
scanf("%c", &ch);
To read an integer:
int n;
scanf("%d", &n);
Note that this is wrong, just like before:
int *n;
scanf("%d", n);
Example:
The following program reads sets of 3 integers, stopping on end-of-file. For
each set it prints them out in descending order.
#include
void swap( int *m, int *n)
{
int k;
k = *m;
*m = *n;
*n = k;
}
int
main(int argc, char *argv[])
{
int a, b, c;
while(scanf("%d %d %d",
&a, &b, &c) != EOF)
{
if (b < c)
swap(&b, &c);
if (a < b)
swap(&a, &b);
if (b < c)
swap(&b, &c);
printf("%d %d %d", a, b, c);
}
exit(0);
}
Summary
This lecture dealt with the control flow constructs of if..., while...
and for... . It looked at the standard library functions of C.
It then dealt with how to define your own functions,
and looked at parameter passing mechanisms. Some very common problems
with addressing were considered.
jan@newmarch.name
Web:
http://jan.newmarch.name/
Last modified: 3 April, 2001