exit
without a code leaves
the code undefined. If you use an exit code in one place, you
must give one for all exit statements.
if [ $# -ne 3 ] then echo "Usage: $0 arg1 arg2 arg3 exit 1 fior
test $# - ne 3 && echo Usage: $0 a1 a2 a3 && exit 1
if [ ! -r $1 ] then echo $1 is unreadable exit 2 fi
echo "..." | ... cat file | ...The first sends an explicit string down a pipeline, the second sends the contents of a file down a pipeline.
sed
is often used to delete text from lines, or to keep
some text in lines.
To delete text, you match the text and replace it with nothing.
sed 's/text//'
To keep text, you match it and save it as \1, \2, etc. The rest of the text is matched as well. The replacement text is only the saved stuff.
sed 's/stuff\(text\)stuff/\1/'
for
or while
loops.
Write a shell script that lists all the executable files in the current directory
for file in `ls` # or for file in * do if [ -x $file ] then echo $file fi doneor
ls | while read file do if [ -x $file ] then echo $file fi done
access_log
to my
Web server.
The access_log
contains entries such as
hickory.canberra.edu.au - - [26/Jul/1994:12:50:03 +1000] "GET /OS/l3_1.html HTTP/1.0" 200 6402 hickory.canberra.edu.au - - [26/Jul/1994:12:50:03 +1000] "GET /OS/l3_1.html HTTP/1.0" 200 6402 vine.canberra.edu.au - - [26/Jul/1994:13:12:00 +1000] "GET /OS/l1_1.html HTTP/1.0" 200 8041 ironwood.canberra.edu.au - - [26/Jul/1994:15:56:31 +1000] "GET /OS/l1_1.html HTTP/1.0" 200 8041(each entry on one line only). This contains information about machine, date and document accessed. It does not contain information about the identity of the user or how long they spent with the document. The file is created in a time-stamp order, by date+time.
I have written a collection of shell and Perl scripts to attempt to analyse this data. They are fairly typical scripts.
access_log
and writes to standard
output a list of dates and lecture identifiers, organised for each lecture
by date.
The first lines of output are
22/Jul/1994 l1_1 26/Jul/1994 l1_1 26/Jul/1994 l1_1 26/Jul/1994 l1_1 26/Jul/1994 l1_1 27/Jul/1994 l1_1 27/Jul/1994 l1_1 27/Jul/1994 l1_1 27/Jul/1994 l1_1 27/Jul/1994 l1_1 27/Jul/1994 l1_1 27/Jul/1994 l1_1 27/Jul/1994 l1_1 27/Jul/1994 l1_1 27/Jul/1994 l1_1 27/Jul/1994 l1_1 27/Jul/1994 l1_1 27/Jul/1994 l1_1 28/Jul/1994 l1_1 . . . 21/Jul/1994 l1_2 27/Jul/1994 l1_2 27/Jul/1994 l1_2 27/Jul/1994 l1_2 27/Jul/1994 l1_2The script (daily.sh) is
for year in 1994 1995 1996 do for lecture in l1_1 l1_2 l2_1 l2_2 l3_1 l3_2 \ l4_1 l4_2 l5_1 l5_2 l6_1 l6_2 \ l7_1 l7_2 l8_1 l8_2 l9_1 l9_2 l12_1 l12_2 \ l13_1 l14_1 l14_2 l15_1 do grep "$year.*$lecture.html" < access_log done | sed 's/.*\[\(...........\).*OS\/\(.*\).html.*/\1 \2/' doneExplanation: The grep pattern is based on the year plus intervening text plus the lecture (document name). The sed pattern looks for the '[' then captures the date as \1. It then skips to OS/ (as OS\/) and captures the document name as \2. The remainder of the line is matched. In the replacement pattern, only the date and document name are retained.
The output is organised by year as primary sort key (outer for loop).
It is then sorted on document name (inner sort key).
Finally it is sorted by date, because that is the original sort order
of access_log
.
The output is of the form
1 Week: 29 Year: 1994 l1_1 22 Week: 30 Year: 1994 l1_1 10 Week: 31 Year: 1994 l1_1 1 Week: 32 Year: 1994 l1_1 5 Week: 33 Year: 1994 l1_1 10 Week: 34 Year: 1994 l1_1 7 Week: 35 Year: 1994 l1_1 3 Week: 36 Year: 1994 l1_1 7 Week: 37 Year: 1994 l1_1 3 Week: 38 Year: 1994 l1_1 4 Week: 39 Year: 1994 l1_1 4 Week: 40 Year: 1994 l1_1 3 Week: 42 Year: 1994 l1_1The first entry is the count of accesses, given by
uniq -c
.
The rest is the result of the echo statement.
The input to this script is the output from the last script.
while read date lecture do day=`echo $date | cut -d/ -f1` month=`echo $date | cut -d/ -f2` year=`echo $date | cut -d/ -f3` echo Week: `date -d "$month $day, $year" +%U` \ Year: $year $lecture done < daily.sh.out | uniq -cExplanation: The
cut
command extracts fields of strings,
sometimes simpler than sed. cut -d/
uses '/' as delimiter,
since the dates are of the form 29/Jun/1995. The f1
option
extracts the first field, etc.
The date
command handles date formatting. The
+%U
option prints the week for that date.
The command uniq
manages collections of identical lines.
Usually it collapses duplicates. The option -c
counts
the number of duplicates instead.
sed
command,
fed into sort
# "anything" - - [ "11 date chars" chars OS/ "URL" chars sed 's/\(.*\)- - \[\(...........\).*OS\/\(.*\.html\).*/\2 \1 \3/' < access_log | sortThe output is
01/Apr/1996 cherry.canberra.edu.au OS.html 01/Apr/1996 cherry.canberra.edu.au assign1.94.html 01/Apr/1996 cherry.canberra.edu.au assignments.94.html 01/Apr/1996 cherry.canberra.edu.au assignments.94.html 01/Apr/1996 cherry.canberra.edu.au assignments.html 01/Apr/1996 cherry.canberra.edu.au assignments.html 01/Apr/1996 chiron.ringworld.com.au OS.html 01/Apr/1996 chiron.ringworld.com.au OS.html 01/Apr/1996 coho.stanford.edu aut_index.html 01/Apr/1996 dialup.bellatlantic.com l7_2.html 01/Apr/1996 dialup19.x25.infoweb.or.jp OS.html
sed 's/ [^ ]*$//' | grep -v canberra.edu.au | uniq -c | sed 's/..\/.*//' | sort | uniq -cThe first sed eliminates the URL at the end. The grep discards local accesses. The uniq counts repeats on the day field+machine. The next sed deletes two chars followed by '/' followed by anything. Output is
3794 1 1548 2 941 3 584 4 410 5 277 6 204 7 155 8 120 9 . . . .