Introduction
This lecture looks at process management from the O/S point of view.
It covers process state, process creation and process scheduling.
There is also an introduction to inter-process communication (IPC).
Reference: Tanenbaum Chapter 2
Process states
When a process is using the CPU, it is actually running and doing some work.
However, it will not always stay in that state.
If a process does I/O and the device is not ready or just slow, the process
will become blocked and be unable to do any more work until the I/O is complete.
In a multitasking environment, a process is not allowed to use the CPU all
the time. At intervals (maybe signalled by a timer interrupt) it will be stopped
by the O/S so that another process can run. In this state it is not running,
but is runnable.
The process states and transitions are
Process table
The O/S maintains information about each process in a process table. Entries
in this table are often called process control blocks and must contain information
about
- process state
- memory state
- resource state
for each process.
Process state
The process state must contain all the information needed so that the process
can be loaded into memory and run. This includes
- the value of each register
- the program counter
- the stack pointer
Information about the state that the process is in (runnning, runnable, blocked)
and why will need to be stored.
In addition it may contain extra fields such as:
- process ID of itself, its parent, etc.
- elapsed time.
- pending signals that have not yet been dealt with (eg they arrived while
the process was asleep).
Memory state
Pointers to the various memory areas used by the program need to be kept, so
that they can be relocated back as needed.
Resource state
The process will generally have files open, be in particular directory, have
a certain user ID, etc. The information about these will need to be stored
also.
For example, in Unix each process has a file table. The first entry in this
(file descriptor zero) is for the processes' standard input, the second entry
is for standard output, the third for standard error. Additional entries are
made when the process opens more files.
Linux process state
In Linux, the information stored per process is defined by the structure
struct task_struct {
/* these are hardcoded - don't touch */
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
long counter;
long priority;
unsigned long signal;
unsigned long blocked; /* bitmap of masked signals */
unsigned long flags; /* per process flags, defined below */
int errno;
int debugreg[8]; /* Hardware debugging registers */
struct exec_domain *exec_domain;
/* various fields */
struct linux_binfmt *binfmt;
struct task_struct *next_task, *prev_task;
struct sigaction sigaction[32];
unsigned long saved_kernel_stack;
unsigned long kernel_stack_page;
int exit_code, exit_signal;
unsigned long personality;
int dumpable:1;
int did_exec:1;
int pid,pgrp,session,leader;
int groups[NGROUPS];
/*
* pointers to (original) parent process, youngest child, younger siblin
g,
* older sibling, respectively. (p->father can be replaced with
* p->p_pptr->pid)
*/
struct task_struct *p_opptr, *p_pptr, *p_cptr, *p_ysptr, *p_osptr;
struct wait_queue *wait_chldexit; /* for wait4() */
unsigned short uid,euid,suid,fsuid;
unsigned short gid,egid,sgid,fsgid;
unsigned long timeout;
unsigned long it_real_value, it_prof_value, it_virt_value;
unsigned long it_real_incr, it_prof_incr, it_virt_incr;
long utime, stime, cutime, cstime, start_time;
struct rlimit rlim[RLIM_NLIMITS];
unsigned short used_math;
char comm[16];
/* virtual 86 mode stuff */
struct vm86_struct * vm86_info;
unsigned long screen_bitmap;
unsigned long v86flags, v86mask, v86mode;
/* file system info */
int link_count;
struct tty_struct *tty; /* NULL if no tty */
/* shm stuff */
struct shm_desc *shm;
struct sem_undo *semun;
/* ldt for this task - used by Wine. If NULL, default_ldt is used */
struct desc_struct *ldt;
/* tss for this task */
struct tss_struct tss;
/* filesystem information */
struct fs_struct fs[1];
/* open file information */
struct files_struct files[1];
/* memory management info */
struct mm_struct mm[1];
};
Process creation
When a process starts, the O/S has to build an entry for it in the process
table. The process state will be marked as ``runnable''.
Here is the Linux fork()
routine:
/*
* Ok, this is the main fork-routine. It copies the system process
* information (task[nr]) and sets up the necessary registers. It
* also copies the data segment in its entirety.
*/
int do_fork(unsigned long clone_flags, unsigned long usp, struct pt_regs *regs)
{
int nr;
unsigned long new_stack;
struct task_struct *p;
if(!(p = (struct task_struct*)__get_free_page(GFP_KERNEL)))
goto bad_fork;
new_stack = get_free_page(GFP_KERNEL);
if (!new_stack)
goto bad_fork_free;
nr = find_empty_process();
if (nr < 0)
goto bad_fork_free;
*p = *current;
if (p->exec_domain && p->exec_domain->use_count)
(*p->exec_domain->use_count)++;
if (p->binfmt && p->binfmt->use_count)
(*p->binfmt->use_count)++;
p->did_exec = 0;
p->kernel_stack_page = new_stack;
*(unsigned long *) p->kernel_stack_page = STACK_MAGIC;
p->state = TASK_UNINTERRUPTIBLE;
p->flags &= ~(PF_PTRACED|PF_TRACESYS);
p->pid = last_pid;
p->p_pptr = p->p_opptr = current;
p->p_cptr = NULL;
p->signal = 0;
p->it_real_value = p->it_virt_value = p->it_prof_value = 0;
p->it_real_incr = p->it_virt_incr = p->it_prof_incr = 0;
p->leader = 0; /* process leadership doesn't inherit */
p->tty_old_pgrp = 0;
p->utime = p->stime = 0;
p->cutime = p->cstime = 0;
p->start_time = jiffies;
p->mm->swappable = 0; /* don't try to swap it out before it's set up *
/
task[nr] = p;
SET_LINKS(p);
nr_tasks++;
/* copy all the process information */
copy_thread(nr, clone_flags, usp, p, regs);
if (copy_mm(clone_flags, p))
goto bad_fork_cleanup;
p->semundo = NULL;
copy_files(clone_flags, p);
copy_fs(clone_flags, p);
/* ok, now we should be set up.. */
p->mm->swappable = 1;
p->exit_signal = clone_flags & CSIGNAL;
p->counter = current->counter >> 1;
p->state = TASK_RUNNING; /* do this last, just in case */
return p->pid;
bad_fork_cleanup:
task[nr] = NULL;
REMOVE_LINKS(p);
nr_tasks--;
bad_fork_free:
free_page(new_stack);
free_page((long) p);
bad_fork:
return -EAGAIN;
}
Process scheduling
The process scheduler is responsible for changing the state of processes.
A scheduler must compromise between certain desirable things:
- make sure each process gets a fair share of the CPU
- keep the CPU busy 100% of the time
- minimize response delays
- be consistent so that behaviour is expected
Stopping processes
There are two ways in which processes can stop: voluntarily stop, or be forced
to stop. If there is a mechanism to stop processes, then the scheduling is
preemptive, otherwise it is non-preemptive
or co-operative.
OS2 and Windows use non-preemptive schedulers. Windows NT and Unix use preemptive
schedulers.
To keep the CPU busy as much as possible, whenever a process blocks, it should
be replaced.
To only allow a process to run for a maximum amount of time, timer interrupts
can be set to stop the process.
When a process stops or is stopped, the registers have to be copied into the
process control block. Any values in the PCB that have changed will need to
be set. (Because of the register copy, the O/S code to do that would have to
be in Assembler.)
Round robin scheduling
This is one of the simplest scheduling algorithms. The O/S maintains a queue
of runnable processes. When a running process is stopped, or a blocked process
becomes runnable, it is placed on the end of the queue.
When the CPU becomes free, the process at the head of the queue is loaded and
made the running process.
Each process is allowed to run for a fixed maximum time before it is stopped.
This fixed time must be carefully tuned. If it is too long, then response time
for processes currently not running is too slow. If the time is too short,
then the system will spend proportionally more time swapping processes and
will just appear to be running slowly.
Priority scheduling
In round robin, all processes have the same priority. In priority scheduling
they may have different ones. The O/S will maintain lists of processes, one
for each priority. A runnable process from the highest priority list will be
chosen.
Priorities may be static (set at creation time) or dynamic. In Unix, the longer
a process runs, the lower its priority becomes.
Within a queue, a process may be chosen say by round robin.
Shortest job first
This is often used on batch systems. In these it is quite common to know in
advance about how long a job will take. Then the list of jobs can be ordered
on time required. The one needing the shortest time can be scheduled first.
This gives the best average time.
This cannot easily be used in interactive systems because you don't know how
long a command will take.
Two level scheduling
The above schemes assume either that all processes are in memory, or that only
one process (the currently running one) is in memory. Then the cost of context
switching is about equal.
The cost of context switching to a process that is in memory is that the registers
have to be loaded, and a jump made to the program counter.
The extra cost of context switching to a process that is not in memory is that
it first has to be loaded from disk. This will be significantly slower.
A two-level scheduler will maintain two lists, one for processes in main memory,
the other for those on disk. One scheduler will work on those in memory, switching
between them.
Periodically another scheduler will move processes between the lists so that
those on disk can have a chance to run.
Linux Scheduling
Here is part of the Linux scheduling function (the wakeup stuff belonging
to alalrm calls is omitted)
asmlinkage void schedule(void)
{
int c;
struct task_struct * p;
struct task_struct * next;
unsigned long ticks;
/* check alarm, wake up any interruptible tasks that have got a signal */
/* code omitted here - Jan */
/* this is the scheduler proper: */
c = -1000;
next = p = &init_task;
for (;;) {
if ((p = p->next_task) == &init_task)
goto confuse_gcc2;
if (p->state == TASK_RUNNING) {
nr_running++;
if (p->counter > c)
c = p->counter, next = p;
}
}
confuse_gcc2:
if (!c) {
for_each_task(p)
p->counter = (p->counter >> 1) + p->priority;
}
if (current == next)
return;
kstat.context_swtch++;
switch_to(next);
}
The full scheduling code is available as
sched.c
and
sched.h
where switch_to
is implemented as a macro for speed:
/*
* switch_to(n) should switch tasks to task nr n, first
* checking that n isn't the current task, in which case it does nothing.
* This also clears the TS-flag if the task we switched to has used
* tha math co-processor latest.
*/
#define switch_to(tsk) \
__asm__("cli\n\t" \
"xchgl %%ecx,_current\n\t" \
"ljmp %0\n\t" \
"sti\n\t" \
"cmpl %%ecx,_last_task_used_math\n\t" \
"jne 1f\n\t" \
"clts\n" \
"1:" \
: /* no output */ \
:"m" (*(((char *)&tsk->tss.tr)-4)), \
"c" (tsk) \
:"cx")
Windows 95 Scheduling
When Windows95 runs an MSDOS application it does so using a ``virtual 8086''.
This is a process that has its own 1M address space. The process does not
have access to any other resources. So an MSDOS application cannot crash
the system any more.
Since MSDOS is not multitasking, the only way it can
create new processes is as sequential ones. Scheduling is not an issue
here, since each process has to run to completion before a parent can
resume. Such multiple processes created within MSDOS are not visible
to Windows95.
All Windows 3.1 applications run within a shared address space. It is not
possible to change the scheduling algorithm without breaking many Windows
3.1 applications, so the cooperative scheduling mechanism must remain
for these.
It is possible for one Windows 3.1 application to lock out others, just
as it happens under Windows 3.1. There is no change in this. However,
whena Windows 3.1 application crashes, it can only bring down the other
Windows 3.1 applications that are running.
Windows 32-bit applications each run within their own protected memory
space. No Windows 32-bit application can crash another by corrupting its
memory because it does not have access (unless deliberately shared).
Each 32-bit application may use threads internally, so that a
32-bit application may be one process with multiple threads.
An MSDOS application is one process and one thread.
Each Windows 3.1 application is one process and one thread.
The Windows 95 scheduler manages threads rather than processes.
The scheduler is pre-emptive so that one process cannot lock out others
(see caveats later). The scheduler uses priority scheduling where a thread
can have a priority between 0 and 31. If multiple processes have the same
priority, round robin is used.
The time slice defaults to 20ms, but the scheduler has the ability to change
this. It can also temporarily bump up priorities if needed (e.g. in
response to a mouse click). I don't know the full details of the algorithm.
The caveat is as follows: Windows 3.1 16-bit system code is not ``re-entrant''.
That is, it is not possible to interrupt a Windows 3.1 system call and
restart it, later continuing with the interrupted call. This is why
Windows 3.1 scheduling must be cooperative rather than preemptive
because the system calls cannot be interrupted.
To ensure that two processes do not attempt to run Windows 3.1 system
calls simultaneously, Windows 95 uses a ``semaphore'' (see next
lecture) to control access.
Some Windows 95 32-bit code calls the old Windows 3.1 16-bit code. Not
much, but some. So if a Windows 3.1 application won't terminate and
release the semaphore, then the 32-bit code can't call the 16-bit code.
So a 32-bit application will block at that point.
If all 32-bit applications hit 16-bit code, then they will all block
and the system will. Killing the offending 16-bit application will
allow the rest to proceed.
Unix process startup
When a system boots Unix, it creates a single process called ``init''. This
init acts as the root of the process tree.
init forks a set of copies of itself, one for each terminal line that is connected
to it. Each one of these type the login message and then blocks waiting for
terminal input.
When the user name and password are typed, init checks that the user is valid,
and if so changes to the user's home directory, resets the user ID of the process
to the user, and ``exec''s a shell.
At this point the init for that terminal line has been replaced by the login
shell for that user. The user does whatever they want to do, and eventually
logs out.
Logging out terminates the login shell. In the meantime, the init at the top
of the process tree has done a wait, waiting on any of its children. The login
shell is in fact a direct child of this toplevel init because it came from
exec-ing a child. So when the shell terminates, the toplevel init wakes up.
The toplevel init then forks a new init for that terminal line and starts a
wait again, for another child to terminate.
Intro to InterProcess Communication
Processes do not run in isolation from each other. Generally they need to communicate
with each other.
Example:
Any two processes in a pipeline are communicating. One sends a stream of bytes
to the other.
Example:
Access to the lineprinter is controlled by a single process called ``lpd''
(the lineprinter daemon). Each time a user runs ``lpr'' this has to communicate
with ``lpd'' and send it the file to print.
Example:
Your home directories are stored on the machine ``willow''. Each time you access
a file the O/S has to make a connection to willow, request the file from a
suitable process on willow and accept responses from it.
There are many models of IPC, with different advantages and problems.
Shared memory
If two processes share the same piece of memory then they can use this to communicate.
For example, one may write information in this shared area and the other may
read it.
This can be a very fast method of information transfer because RAM can be used.
Synchronisation is a major problem - if the first process keeps writing data,
how can it ensure that the second one reads it?
Pipeline
A pipe acts like a channel betwen two processes. When one process writes into
the pipe the other process can read from it. A pipe can usually buffer information
so that the writer can place a lot of information in the pipe before the child
has to read it. When the pipe becomes full the writer has to suspend.
Pipes can be un-named. This is the norm in Unix where a process creates a pipe
and then forks, so that the two processes share the pipe between them.
If the processes do not come from a common ancestor then they can only share
a pipe if they can both name it (otherwise they could not find it). Named pipes
usually appear as though they were files in the file system.
Streams
Pipes carry unstructured data - you put bytes in one end and get the same bytes
out the other. Streams are designed to carry record information - you put records
in at one end and get the same records out the other. Each record must contain
a field saying how large it is.
Sockets
Sockets are more like ports that you can send data to. A process will be ``listening''
at a port and will accept data sent to it.
Signals
Signals are a fairly crude method if IPC. A process may send a signal to another
such as ``wake up'' or ``die''. The other process can respond to these signals
in various ways.
This page is
http://pandonia.canberra.edu.au/OS/l8_2.html,
copyright Jan Newmarch.
It is maintained by Jan Newmarch.
email:
jan@ise.canberra.edu.au
Web:
http://pandonia.canberra.edu.au/
Last modified: 29 August, 1995