Introduction

This lecture looks at more shell programming. It looks at more advanced features which may be needed on occassions.

Debugging

A shell script can be debugged by turning tracing on. When a script is being traced, before each command is executed, the command complete with arguments is printed. for i in 1 2 3 do echo $i done is traced as + echo 1 1 + echo 2 2 + echo 3 3 Tracing is turned on by set -x and off by set +x or a shell script, say ``script1'' may be run in debug mode by bash -x script1

Comments

More on sequencing

The sequential mechanism is to have commands on successive lines, or separated by `;'. There are additional ways. To run a command asynchronously command & The command then runs in parallel with whatever else is done, timesharing the computer.

The sequence

command1 && command1 first executes command1. If its exit code is zero (it succeeds) then command 2 is executed. If the exit code is non-zero, then command2 is not executed and execution continues to the next command. It is equivalent to if command1 then command2 fi It is typically used as a shorthand for things like test -f file && rm file which attempts to remove the file only if it exists.

The sequence

command1 || command2 executes command1, and if it fails executes command2 test -f file || \ (echo can\'t find file && exit 1) Enclosing a command in (...) executes the command in a subshell. This is useful if you want to make changes to the environment for some commands without affecting others. For example, to store the parent directory in a variable ParentDir=`(cd ..; pwd)` This is easier than using a sed command which looks for a `/' and then characters that aren't a `/': ParentDir=`pwd | sed 's/\/[^/]*//'`

Comments

Builtin shell variables

There are a large number of variables builtin to the shell. These are global variables that can be used anywhere.

PATH - search path for commands
PS1 - primary prompt string
HOME - your home directory
SHELL - which shell you are using
PWD - your present working directory
USER - who you are

To see the full list of variables set in your environment, run set

Comments

Builtin shell commands

The shell recognises these commands internally.

alias

alias name=value When an alias is set up, typing the alias name as the first word on a line expands to the value before execution. alias ll="ls -l"

Comments

break

Break from any enclosing for, case or while loop. This is used to terminate execution of a loop at a point within a loop

Comments

continue

Continue the for loop back at the top. for i in * do if [ ! -r $file ] then echo cant read $file continue fi # process readable file done

Comments

export

Variables that you create are by default local to that shell. To make a variable global so that it is visible in subshells it must be exported amt=20 export amt

Comments

read

Read a line of input. Assign the first word to the first variable, the second to the second and so on. If there are more words on the input line than there are variables, assign the string of the rest of the words to the last variable. read word1 word2 rest

Comments

shift

Shifts all positional args to the left i.e. loses the old $1, renames $2 as $1, $3 as $2, etc. As a result, $* now contains as $1, $2, ... what used to be $2, $3, ... echo $1 shift echo 'args $2 upwards are' $*

Comments

trap

Trap a signal. Signals are sent to a process usually to tell it to give up. Some signals are SIGHUP (the serial line has hung up), SIGINT (control-C), SIGFPE (floating point exception). Signals are known by their numbers, and SIGINT is 2. If a signal is trapped, execute given code when the signal occurs. trap 'echo "cleaning up" rm tmp* exit 2' 2

true

This command always succeeds.

Comments

Functions

Functions may be defined within the shell using the keyword function function ll { ls -l $* } The positional parameters $1, $2, ..., $*, $# refer now to the arguments with which the function is called.

Comments

Miscellaneous examples

A simple interpreter

my_prompt="mysh>" while true do echo $my_prompt read line bashh $line done

Comments

Restricted shell

A restricted shell that only accepts a small number of commands: while true do echo "Commands are: edit file list quit" read comm arg case $comm in edit) ted $arg;; list) ls -l;; quit) exit 0;; esac done

Comments

Sum file sizes

Sum the size in bytes of all non-directory files in the current directory function fourth_arg { echo $4 } sum=0 for file in * do if [ ! -d $file ] then file_ls=`ls -l $file` size=`fourth_arg $file_ls` let sum=$sum+$size fi done echo $sum

Comments

Automated testing

Automated testing of programs may be done easily. Suppose that you have a set of files called testdata1, testdata2, ... that are the input files for a program called ``program''. The correct output results are in files result1, result2, ... To test a program, you need to run the program with input the test files and compare them to the result files. The file testall contains: i=1 while [ -f testdata$i ] do test1 $i let i=$i+1 done The file test1 contains program < testdata$1 > results if [ ! -f results ] then echo "Test $1 had no result" exit 1 fi if cmp results result$1 then echo "Result of test $1 ok" rm -f results exit 0 else echo "Test $1 failed" echo "Differences are:" diff results result$1 rm -f results exit 2 fi

Comments

Some (more) shell scripts

Exit codes

Your program should exit with 0 if successful, some other number otherwise. The statement exit without a code leaves the code undefined. If you use an exit code in one place, you must give one for all exit statements. Add comment

Checking arg counts

if [ $# -ne 3 ]
then
    echo "Usage: $0 arg1 arg2 arg3
    exit 1
fi

or

test $# - ne 3 && echo Usage: $0 a1 a2 a3 && exit 1

Checking arg type

if [ ! -r $1 ]
then
    echo $1 is unreadable
    exit 2
fi

Strings and pipes

echo "..." | ...
cat file | ...

The first sends an explicit string down a pipeline, the second sends the contents of a file down a pipeline. Add comment

sed

sed is often used to delete text from lines, or to keep some text in lines.

To delete text, you match the text and replace it with nothing.

sed 's/text//'

To keep text, you match it and save it as \1, \2, etc. The rest of the text is matched as well. The replacement text is only the saved stuff.

sed 's/stuff\(text\)stuff/\1/'

Loops

Loops can be for or while loops.

Write a shell script that lists all the executable files in the current directory

for file in `ls`  # or for file in *
do
    if [ -x $file ]
    then
	echo $file
    fi
done

or

ls |
while read file
do
    if [ -x $file ]
    then
	echo $file
    fi
done

Problem statement

Whenever I go for promotions interviews, one question I get is: "Is there any value in placing lectures, software, etc, on the Internet?" One way of answering this is by student surveys. Another is by analysis of the access_log to my Web server.

The access_log contains entries such as

hickory.canberra.edu.au - - [26/Jul/1994:12:50:03 +1000] 
                   "GET /OS/l3_1.html HTTP/1.0" 200 6402
hickory.canberra.edu.au - - [26/Jul/1994:12:50:03 +1000]
                   "GET /OS/l3_1.html HTTP/1.0" 200 6402
vine.canberra.edu.au - - [26/Jul/1994:13:12:00 +1000]
                   "GET /OS/l1_1.html HTTP/1.0" 200 8041
ironwood.canberra.edu.au - - [26/Jul/1994:15:56:31 +1000]
                   "GET /OS/l1_1.html HTTP/1.0" 200 8041

(each entry on one line only). This contains information about machine, date and document accessed. It does not contain information about the identity of the user or how long they spent with the document. The file is created in a time-stamp order, by date+time.

I have written a collection of shell and Perl scripts to attempt to analyse this data. They are fairly typical scripts.

Daily access

This script takes access_log and writes to standard output a list of dates and lecture identifiers, organised for each lecture by date. The first lines of output are

22/Jul/1994 l1_1
26/Jul/1994 l1_1
26/Jul/1994 l1_1
26/Jul/1994 l1_1
26/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
28/Jul/1994 l1_1
  .
  .
  .
21/Jul/1994 l1_2
27/Jul/1994 l1_2
27/Jul/1994 l1_2
27/Jul/1994 l1_2
27/Jul/1994 l1_2

The script (daily.sh) is

for year in 1994 1995 1996
do
  for lecture in l1_1 l1_2 l2_1 l2_2 l3_1 l3_2 \
        l4_1 l4_2 l5_1 l5_2 l6_1 l6_2 \
        l7_1 l7_2 l8_1 l8_2 l9_1 l9_2 l12_1 l12_2 \
        l13_1 l14_1 l14_2 l15_1
  do
    grep "$year.*$lecture.html" < access_log
  done | 
  sed 's/.*\[\(...........\).*OS\/\(.*\).html.*/\1 \2/'
done

Explanation: The grep pattern is based on the year plus intervening text plus the lecture (document name). The sed pattern looks for the '[' then captures the date as \1. It then skips to OS/ (as OS\/) and captures the document name as \2. The remainder of the line is matched. In the replacement pattern, only the date and document name are retained.

The output is organised by year as primary sort key (outer for loop). It is then sorted on document name (inner sort key). Finally it is sorted by date, because that is the original sort order of access_log. Add comment

Access by week

This contains more obscure commands. The intent is to take the output from the last command and produce a count of accesses by week and year for each lecture (weekly.sh).

The output is of the form

      1 Week: 29 Year: 1994 l1_1
     22 Week: 30 Year: 1994 l1_1
     10 Week: 31 Year: 1994 l1_1
      1 Week: 32 Year: 1994 l1_1
      5 Week: 33 Year: 1994 l1_1
     10 Week: 34 Year: 1994 l1_1
      7 Week: 35 Year: 1994 l1_1
      3 Week: 36 Year: 1994 l1_1
      7 Week: 37 Year: 1994 l1_1
      3 Week: 38 Year: 1994 l1_1
      4 Week: 39 Year: 1994 l1_1
      4 Week: 40 Year: 1994 l1_1
      3 Week: 42 Year: 1994 l1_1

The first entry is the count of accesses, given by uniq -c. The rest is the result of the echo statement. The input to this script is the output from the last script.

while read date lecture
do
    day=`echo $date | cut -d/ -f1`
    month=`echo $date | cut -d/ -f2`
    year=`echo $date | cut -d/ -f3`

    echo Week: `date -d "$month $day, $year" +%U` \
Year: $year $lecture
done < daily.sh.out |
uniq -c

Explanation: The cut command extracts fields of strings, sometimes simpler than sed. cut -d/ uses '/' as delimiter, since the dates are of the form 29/Jun/1995. The f1 option extracts the first field, etc.

The date command handles date formatting. The +%U option prints the week for that date.

The command uniq manages collections of identical lines. Usually it collapses duplicates. The option -c counts the number of duplicates instead. Add comment

Machine+date+url

The next script (machine+date+url.sh) is a straight sed command, fed into sort

# "anything" - - [ "11 date chars" chars OS/ "URL" chars
sed 's/\(.*\)- - \[\(...........\).*OS\/\(.*\.html\).*/\2 \1 \3/' < access_log |
sort

The output is

01/Apr/1996 cherry.canberra.edu.au  OS.html
01/Apr/1996 cherry.canberra.edu.au  assign1.94.html
01/Apr/1996 cherry.canberra.edu.au  assignments.94.html
01/Apr/1996 cherry.canberra.edu.au  assignments.94.html
01/Apr/1996 cherry.canberra.edu.au  assignments.html
01/Apr/1996 cherry.canberra.edu.au  assignments.html
01/Apr/1996 chiron.ringworld.com.au  OS.html
01/Apr/1996 chiron.ringworld.com.au  OS.html
01/Apr/1996 coho.stanford.edu  aut_index.html
01/Apr/1996 dialup.bellatlantic.com  l7_2.html
01/Apr/1996 dialup19.x25.infoweb.or.jp  OS.html

Count day accesses

This script takes the output from the last script and produces a table of accesses per day per machine, eliminating the UC machines. This is a straight pipeline (count_day_access.sh).

sed 's/ [^ ]*$//' | grep -v canberra.edu.au |
uniq -c | sed 's/..\/.*//' | 
sort | uniq -c

The first sed eliminates the URL at the end. The grep discards local accesses. The uniq counts repeats on the day field+machine. The next sed deletes two chars followed by '/' followed by anything. Output is

   3794       1
   1548       2
    941       3
    584       4
    410       5
    277       6
    204       7
    155       8
    120       9
     .        .
     .        .

This page is http://pandonia.canberra.edu.au/OS/l13_1.html, copyright Jan Newmarch.

It is maintained by Jan Newmarch.
email: jan@ise.canberra.edu.au
Web: http://pandonia.canberra.edu.au/

Last modified: 5 August, 1996