Regular Expressions

Utilities using regular expressions

grep

Search for lines in a file containing a pattern.


grep pattern files...

It prints each line in the file that matches the pattern. You won't believe how useful this is till you have a lot of files...

Options include

-s

don't print matching lines, just return an exit code. Used in boolean statements


    if grep -s $service /etc/services
    then echo $service is a standard TCP service
    fi

-l

don't print matching lines,just print the names of files containing matching lines. Used when you can't remember the names of the file containing something


    grep -l "important text" *

-c

print a count of matching lines. e.g. how many times are you logged on


    who | grep -c $USER

The expressions used by grep can be any regular expressions

expr

Arithmetic operations and string comparison

expr x arith-op y

Perform the operation on x and y, write the result to standard output. In the Bourne shell, used for all arithmetic e.g.


n=`expr $n + 1`
x=`expr 2 \* 3`

expr x rel-op y

Compare x and y. Write 1 to standard output if true, else write 0. return an exit code of 0 if true, else non-zero


   if expr $x \< $y > /dev/null
   then ...

expr string : regexp

perform a pattern match of the string against the regular expression. The regular expression is "anchored" to the beginning of line. Writes 1 to standard output of the match succeeds, and has an exit code of zero, else writes zero and has an exit code of 1


   if expr "$USER" : root > /dev/null
   then  echo user is root
   else echo user is an ordinary user
   fi

Other operations are also available, these are the major ones.

sed

Stream editor. This is useful for on the fly editing, typically of small strings.


sed 1,10d file

prints file with lines 1-10 deleted


sed 20q file

prints first 20 lines and then quits


sed -n 20,30p file

-n turns off default printing, so only prints lines 20 to 30


sed 's/old/new/' file

prints file with occurrences of ``old'' changed to ``new''. Both old and new can be regular expressions.

If no file is given, sed reads from standard input e.g. to remove the first header line from ps output


ps | sed 1d

Regular expressions

Many editors allow you to search for a string in a file. Usually this is just an ordinary piece of text. Sometimes you want something more complex. e.g. search for either ``the'' or ``The''.

The Turbo Pascal editor allows Ctrl-A to stand for any single character.

The Unix utilities grep, ed, vi, sed, awk, emacs etc., all support a particular type of pattern.

This is also available from C using the regexp or regex libraries, and is available in some other languages such as tcl. Perl is commonly used for CGI programs and makes heavy use of regular expressions.

Because the utilities grep and sed are often used in Unix shell programs it is worth looking at their pattern mechanism.

The simplest patterns are

^ matches the start of a line
$ matches the end of a line
. matches any single character
* after a character matches zero or more occurrences of that character
[chars] matches any single one of the characters.

Examples:

^The$ matches ``The'' on a line by itself.
[Tt]he matches ``The'' or ``the''
[0-9]* matches any number
990[34][0-9]* matches Monash phone numbers

If there is a choice of matches, grep uses the expression with the longest match.

Example:

The long listing for a file may be

-rw-r--r-- 1 jan 2048 Jul 4 file1

To just extract the permissions part, use

ls -l | sed 's/ .*//'

More complicated patterns are

\c for any character c, matches c (allows you to "escape" special characters.
[^chars] matches anything but chars.
$pattern$ matches whatever the pattern is i.e. the \( and )\ are ignored. However it also makes a "copy" of the pattern and saves it.
\n for integer n, matches the nth pattern saved above.

Example

To match anything from the beginning of the line except a full stop, then the full stop, after that from there to the end of the line, saving both patterns (but not the full stop):

^\([^.]*\)\.\(.*\)$

Then the same patterns reversed are

\2\1

For example, to change ``John.Smith'' to ``Smith, John'':

echo "John.Smith" |
  sed 's/^\([^.]*\)\.\(.*\)$/\2, \1/'

Jan Newmarch (http://jan.newmarch.name)