Regular Expressions

Contents

Examples:
Example:
Example
Many editors allow you to search for a string in a file. Usually this is just an ordinary piece of text. Sometimes you want something more complex. e.g. search for either ``the'' or ``The''.

The Turbo Pascal editor allows Ctrl-A to stand for any single character.

The Unix utilities grep, ed, vi, sed, awk, emacs etc., all support a particular type of pattern.

This is also available from C using the regexp or regex libraries, and is available in some other languages such as tcl. Perl is commonly used for CGI programs and makes heavy use of regular expressions.

Because the utilities grep and sed are often used in Unix shell programs it is worth looking at their pattern mechanism.

The simplest patterns are

  1. ^ matches the start of a line
  2. $ matches the end of a line
  3. . matches any single character
  4. * after a character matches zero or more occurrences of that character
  5. [chars] matches any single one of the characters.

Examples:

  1. ^The$ matches ``The'' on a line by itself.
  2. [Tt]he matches ``The'' or ``the''
  3. [0-9]* matches any number
  4. ss[89][0-9]* matches ss student ids

Example:

The long listing for a file may be
-rw-r--r-- 1 jan 2048 Jul 4 file1
To just extract the permissions part, use
ls -l | sed 's/ .*//'

More complicated patterns are

  1. \c for any character c, matches c (allows you to "escape" special characters.
  2. [^chars] matches anything but chars.
  3. \(pattern\) matches whatever the pattern is i.e. the \( and )\ are ignored. However it also makes a "copy" of the pattern and saves it.
  4. \n for integer n, matches the nth pattern saved above.

Example

To match anything from the beginning of the line except a full stop, then the full stop, after that from there to the end of the line, saving both patterns (but not the full stop):
^\([^.]*\)\.\(.*\)$
Then the same patterns reversed are
\2\1

For example, to change ``John.Smith'' to ``Smith, John'':

echo "John.Smith" |
  sed 's/^\([^.]*\)\.\(.*\)$/\2, \1/'

Jan Newmarch (http://jan.newmarch.name)
jan@newmarch.name
Last modified: Wed Nov 19 17:37:40 EST 1997
Copyright ©Jan Newmarch