Some shell scripts

Exit codes

Your program should exit with 0 if successful, some other number otherwise. The statement exit without a code leaves the code undefined. If you use an exit code in one place, you must give one for all exit statements.

Checking arg counts

if [ $# -ne 3 ]
then
    echo "Usage: $0 arg1 arg2 arg3
    exit 1
fi

test $# - ne 3 && echo Usage: $0 a1 a2 a3 && exit 1

Checking arg type

if [ ! -r $1 ]
then
    echo $1 is unreadable
    exit 2
fi

Strings and pipes

echo "..." | ...
cat file | ...

The first sends an explicit string down a pipeline, the second sends the contents of a file down a pipeline.

sed

sed is often used to delete text from lines, or to keep some text in lines.

To delete text, you match the text and replace it with nothing.

sed 's/text//'

To keep text, you match it and save it as \1, \2, etc. The rest of the text is matched as well. The replacement text is only the saved stuff.

sed 's/stuff\(text\)stuff/\1/'

Loops

Loops can be for or while loops.

Write a shell script that lists all the executable files in the current directory

for file in `ls`  # or for file in *
do
    if [ -x $file ]
    then
	echo $file
    fi
done

ls |
while read file
do
    if [ -x $file ]
    then
	echo $file
    fi
done

Problem statement

Whenever I go for promotions interviews, one question I get is: "Is there any value in placing lectures, software, etc, on the Internet?" One way of answering this is by student surveys. Another is by analysis of the access_log to my Web server.

The access_log contains entries such as

hickory.canberra.edu.au - - [26/Jul/1994:12:50:03 +1000] 
                   "GET /OS/l3_1.html HTTP/1.0" 200 6402
hickory.canberra.edu.au - - [26/Jul/1994:12:50:03 +1000]
                   "GET /OS/l3_1.html HTTP/1.0" 200 6402
vine.canberra.edu.au - - [26/Jul/1994:13:12:00 +1000]
                   "GET /OS/l1_1.html HTTP/1.0" 200 8041
ironwood.canberra.edu.au - - [26/Jul/1994:15:56:31 +1000]
                   "GET /OS/l1_1.html HTTP/1.0" 200 8041

(each entry on one line only). This contains information about machine, date and document accessed. It does not contain information about the identity of the user or how long they spent with the document. The file is created in a time-stamp order, by date+time.

I have written a collection of shell and Perl scripts to attempt to analyse this data. They are fairly typical scripts.

Daily access

This script takes access_log and writes to standard output a list of dates and lecture identifiers, organised for each lecture by date. The first lines of output are

22/Jul/1994 l1_1
26/Jul/1994 l1_1
26/Jul/1994 l1_1
26/Jul/1994 l1_1
26/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
27/Jul/1994 l1_1
28/Jul/1994 l1_1
  .
  .
  .
21/Jul/1994 l1_2
27/Jul/1994 l1_2
27/Jul/1994 l1_2
27/Jul/1994 l1_2
27/Jul/1994 l1_2

The script (daily.sh) is

for year in 1994 1995 1996
do
  for lecture in l1_1 l1_2 l2_1 l2_2 l3_1 l3_2 \
        l4_1 l4_2 l5_1 l5_2 l6_1 l6_2 \
        l7_1 l7_2 l8_1 l8_2 l9_1 l9_2 l12_1 l12_2 \
        l13_1 l14_1 l14_2 l15_1
  do
    grep "$year.*$lecture.html" < access_log
  done | 
  sed 's/.*\[\(...........\).*OS\/\(.*\).html.*/\1 \2/'
done

Explanation: The grep pattern is based on the year plus intervening text plus the lecture (document name). The sed pattern looks for the '[' then captures the date as \1. It then skips to OS/ (as OS\/) and captures the document name as \2. The remainder of the line is matched. In the replacement pattern, only the date and document name are retained.

The output is organised by year as primary sort key (outer for loop). It is then sorted on document name (inner sort key). Finally it is sorted by date, because that is the original sort order of access_log.

Access by week

This contains more obscure commands. The intent is to take the output from the last command and produce a count of accesses by week and year for each lecture (weekly.sh).

The output is of the form

      1 Week: 29 Year: 1994 l1_1
     22 Week: 30 Year: 1994 l1_1
     10 Week: 31 Year: 1994 l1_1
      1 Week: 32 Year: 1994 l1_1
      5 Week: 33 Year: 1994 l1_1
     10 Week: 34 Year: 1994 l1_1
      7 Week: 35 Year: 1994 l1_1
      3 Week: 36 Year: 1994 l1_1
      7 Week: 37 Year: 1994 l1_1
      3 Week: 38 Year: 1994 l1_1
      4 Week: 39 Year: 1994 l1_1
      4 Week: 40 Year: 1994 l1_1
      3 Week: 42 Year: 1994 l1_1

The first entry is the count of accesses, given by uniq -c. The rest is the result of the echo statement. The input to this script is the output from the last script.

while read date lecture
do
    day=`echo $date | cut -d/ -f1`
    month=`echo $date | cut -d/ -f2`
    year=`echo $date | cut -d/ -f3`

    echo Week: `date -d "$month $day, $year" +%U` \
Year: $year $lecture
done < daily.sh.out |
uniq -c

Explanation: The cut command extracts fields of strings, sometimes simpler than sed. cut -d/ uses '/' as delimiter, since the dates are of the form 29/Jun/1995. The f1 option extracts the first field, etc.

The date command handles date formatting. The +%U option prints the week for that date.

The command uniq manages collections of identical lines. Usually it collapses duplicates. The option -c counts the number of duplicates instead.

Machine+date+url

The next script (machine+date+url.sh) is a straight sed command, fed into sort

# "anything" - - [ "11 date chars" chars OS/ "URL" chars
sed 's/\(.*\)- - \[\(...........\).*OS\/\(.*\.html\).*/\2 \1 \3/' < access_log |
sort

The output is

01/Apr/1996 cherry.canberra.edu.au  OS.html
01/Apr/1996 cherry.canberra.edu.au  assign1.94.html
01/Apr/1996 cherry.canberra.edu.au  assignments.94.html
01/Apr/1996 cherry.canberra.edu.au  assignments.94.html
01/Apr/1996 cherry.canberra.edu.au  assignments.html
01/Apr/1996 cherry.canberra.edu.au  assignments.html
01/Apr/1996 chiron.ringworld.com.au  OS.html
01/Apr/1996 chiron.ringworld.com.au  OS.html
01/Apr/1996 coho.stanford.edu  aut_index.html
01/Apr/1996 dialup.bellatlantic.com  l7_2.html
01/Apr/1996 dialup19.x25.infoweb.or.jp  OS.html

Count day accesses

This script takes the output from the last script and produces a table of accesses per day per machine, eliminating the UC machines. This is a straight pipeline (count_day_access.sh).

sed 's/ [^ ]*$//' | grep -v canberra.edu.au |
uniq -c | sed 's/..\/.*//' | 
sort | uniq -c

The first sed eliminates the URL at the end. The grep discards local accesses. The uniq counts repeats on the day field+machine. The next sed deletes two chars followed by '/' followed by anything. Output is

   3794       1
   1548       2
    941       3
    584       4
    410       5
    277       6
    204       7
    155       8
    120       9
     .        .
     .        .