CGI Scripts

CGI scripts are URLs recognised by the HTTP server as representing programs rather than static files. The program is executed in an environment with information from the URL request and also about the server.

For lots of goodies on CGI, see http://cgi.resourceindex.com

Environment Variables

A number of environment variables are defined in the CG/1.1 specification. Not all servers give all of them.


    AUTH_TYPE
    CONTENT_LENGTH
    CONTENT_TYPE
    GATEWAY_INTERFACE
    PATH_INFO
    PATH_TRANSLATED
    QUERY_STRING
    REMOTE_ADDR
    REMOTE_HOST
    REMOTE_IDENT
    REMOTE_USER
    REQUEST_METHOD
    SCRIPT_NAME
    SERVER_NAME
    SERVER_PORT
    SERVER_PROTOCOL
    SERVER_SOFTWARE

In addition, a number of variables beginning with HTTP_ are defined. These are derived from the header information of the HTTP request.

Some standard environment variables will also be set.

Netscape browser to Apache server on Linux


BASH=/bin/sh
BASH_VERSION=1.14.7(1)
DOCUMENT_ROOT=/home/httpd/html
EUID=99
GATEWAY_INTERFACE=CGI/1.1
HOSTTYPE=i386
HTTP_ACCEPT=image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
HTTP_ACCEPT_CHARSET=iso-8859-1,*,utf-8
HTTP_ACCEPT_ENCODING=gzip
HTTP_ACCEPT_LANGUAGE=en
HTTP_CONNECTION=Keep-Alive
HTTP_HOST=127.0.0.1
HTTP_PRAGMA=no-cache
HTTP_USER_AGENT=Mozilla/4.7 [en] (X11; I; Linux 2.2.5-22 i586)
IFS=    

OPTERR=1
OPTIND=1
OSTYPE=Linux
PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin
PPID=1044
PS4=+ 
PWD=/home/httpd/cgi-bin
QUERY_STRING=
REMOTE_ADDR=127.0.0.1
REMOTE_PORT=2533
REQUEST_METHOD=GET
REQUEST_URI=/cgi-bin/test-cgi-all
SCRIPT_FILENAME=/home/httpd/cgi-bin/test-cgi-all
SCRIPT_NAME=/cgi-bin/test-cgi-all
SERVER_ADMIN=root@localhost
SERVER_NAME=pandonia.canberra.edu.au
SERVER_PORT=80
SERVER_PROTOCOL=HTTP/1.0
SERVER_SIGNATURE=
SERVER_SOFTWARE=Apache/1.3.6 (Unix)  (Red Hat/Linux) ApacheJServ/1.1
SHELL=/bin/sh
SHLVL=1
TERM=dumb
UID=99
_=

Lynx browser to Apache server on Linux


BASH=/bin/sh
BASH_VERSION=1.14.7(1)
DOCUMENT_ROOT=/home/httpd/html
EUID=99
GATEWAY_INTERFACE=CGI/1.1
HOSTTYPE=i386
HTTP_ACCEPT=text/html, text/plain, application/vnd.ms-excel, application/msword
, application/pdf, application/x-gzip, application/x-compressed, application/x- zip-compressed, application/postscript, application/applefile, application/x-me
tamail-patch, sun-deskset-message, mail-file, default, postscript-file, audio-f ile, x-sun-attachment, text/enriched, text/richtext, application/andrew-inset,
x-be2, application/postscript, message/external-body, message/partial, applicat
ion/pgp, application/pgp, video/mpeg, video/*, image/*, audio/mod, text/sgml, v ideo/mpeg, image/jpeg, image/tiff, image/x-rgb, image/png, image/x-xbitmap, ima
ge/x-xbm, image/gif, application/postscript, */*;q=0.01
HTTP_ACCEPT_ENCODING=gzip, compress
HTTP_ACCEPT_LANGUAGE=en
HTTP_HOST=pandonia.canberra.edu.au
HTTP_NEGOTIATE=trans
HTTP_USER_AGENT=Lynx/2.8.1rel.2 libwww-FM/2.14
IFS=

OPTERR=1
OPTIND=1
OSTYPE=Linux
PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/X11R6/bin
PPID=494
PS4=+
PWD=/home/httpd/cgi-bin
QUERY_STRING=
REMOTE_ADDR=137.92.11.13
REMOTE_PORT=2575
REQUEST_METHOD=GET
REQUEST_URI=/cgi-bin/test-cgi-all
SCRIPT_FILENAME=/home/httpd/cgi-bin/test-cgi-all
SCRIPT_NAME=/cgi-bin/test-cgi-all
SERVER_ADMIN=root@localhost
SERVER_NAME=pandonia.canberra.edu.au
SERVER_PORT=80                                          
                                                                      (p3 of 3)
SERVER_PROTOCOL=HTTP/1.0
SERVER_SIGNATURE=
SERVER_SOFTWARE=Apache/1.3.6 (Unix)  (Red Hat/Linux) ApacheJServ/1.1
SHELL=/bin/sh
SHLVL=1
TERM=dumb
UID=99
_=                    

Script URI

A CGI script is typically called from a Form on a browser. The form information is sent in an encoded form


script-uri = scheme://host port/path?query
script-uri = protocol "://" SERVER_NAME ":" SERVER_PORT
             enc-script enc-path-info "?" QUERY_STRING
where enc-script is a URL-encoded version of SCRIPT_NAME and enc-path-info is a URL-encoded version of PATH_INFO

e.g.


http://pandonia:80/cgi-bin/cgi-script?name=Jan&phone=6201+2422

QUERY_STRING

The string is of the form


name=value&name=value&...
The names and values may be URL-encoded, which is:

To process the Form data, you have to

GET and POST

An HTTP GET request will be followed by the form data, and this will be given in QUERY_STRING. A POST request will leave QUERY_STRING empty, and the form data will need to be read from standard input. It should be in the same form as in QUERY_STRING.

The two methods are needed. The GET method is traditional. However, on some O/Ss, a long QUERY_STRING could overflow the environment space, leading to a loss of form data. The POST can have form data of any length.

Security

See http://www.csclub.uwaterloo.ca/u/mlvanbie/cgisec/

Anything can send a request to your server - the "correct" Form, or a malicious attacker. Never trust the input to be "correct", or to correspond to what you expect.

Shell Programming

A minimal CGI script using the Bourne shell will just print the current environment:


#!/bin/sh

echo "404: okay"
echo "Content-Type: text/html"
echo

echo "<html><head> </head> <body>"
echo "<pre>"
set
echo "</pre>"
echo "</body> </html>"

A tool to parse form data using the shell only is at http://www.informatik.uni-frankfurt.de/~fp/Tools/ProcCGIInput.html. Its author does not recommend use since it is slow.

C Programming

A typical library to parse CGI form data is at http://www.geocities.com/SiliconValley/Vista/6493/projects/cgi-lib.html

A sample program is


#include 
#include "cgi-lib.h" /* include the cgi-lib.h header file */
#include "html-lib.h" /* include the html-lib.h header file */

int main()
{
        /* need to declare a pointer variable of
           type LIST to keep track of our list */
        LIST *head;

        /* need to call this function at the beginning
           to initiate and setup out list */
        head = cgi_input_parse();

        /* send the mime header to the server
           using our function in html-lib */
        mime_header("text/html");

        /* send the top of our html page to the server */
        html_begin("CGI Sample Application",NULL);

        /* send the text enclosed in the heading tags */
        h1("CGI Sample Application");

        /* send break lines */
        printf("\n");

        /* send the text enclosed in the heading tags */
        h2("Parsed Values");

        /*
        we need to check if head is NULL, this will tell us if we have
        any data in our linked list, if it is NULL then we don't need  to
        call any other CGI-LIB functions because there is nothing to do.
        */
        if(head == NULL)
                h3("List Empty");
        else
                list_print(head);

        printf("\n");

        /* send text enclosed in heading tags */
        h2("Environment Variables");

        /* print out all the environment variables */
        cgi_env();

        /* send the html closing tags */
        html_end();

        return 0;
}

Java Programming

Java has a URL encoding class, but not a decoder class. You need to write one or get it from somewhere. Running Java programs does not fit the CGI model. Running a Java program requires use of the Java runtime engine:


    java Class
i.e. the first argument is the class name, and CGI scripts do not include any command-line arguments. So you have to have a shell script as CGI script, that calls the correct Java program.

The site http://www.orbits.com/software/Java_CGI.html supplies a CGI script and Java classes. The script is


CLASSPATH=JAVACGI
export CLASSPATH

#  Extract the name of the CGI program from the server-provided information.
Program=`echo $PATH_INFO | cut -f2 -d"/"`

#  Dump all of the environment data, particularly that provided by the HTTP
#    server, into a temporary file.
set > /tmp/JavaCGI.$$

#  Run the java interpreter, supplying the name of the file with the
#    environment data and the name of the CGI program.
/usr/local/java/bin/java -DDataID=/tmp/JavaCGI.$$ $Program

#  Clean up.
rm -f /tmp/JavaCGI.$$

An example progra musing the package is


//  Copyright (c) 1996, David H. Silber  (dhs@orbits.com)
import java.io.*;
import java.util.*;

import Orbits.net.*;

//  This class will get all of the CGI information and return it to the
//    browser.
public class CGI_Test {

        public static void main( String argv[] ) {

                //  Produce output that the browser will understand.
                System.out.println( "Content-type: text/plain" );
                System.out.println( "" );

                //  Get the CGI information.
                CGI info = new CGI();

                //  Put the list of variable names into a list.
                Enumeration e = info.getNames();

                while ( e.hasMoreElements() ) {
                        //  For each name in the list.
                        String name = (String) e.nextElement();
                        System.out.print( name );
                        System.out.print( " evaluates to " );

                        //  Get the associated value.
                        String value = info.getValue( name );
                        System.out.println( value );
                }
        }
}

Debugging

Basically, a disaster area. If your program crashes, then there is no output, and you just get an "internal server error" message. Your program must work well enough to avoid crashing, or you cannot debug it.

Even if it is working, you need lots of print statements. There is no easy way of running a debugger on it.

You can test the program in standalone mode, but you will need to set all of the environment variables beforehand.

Modelling CGI

Source: http://www.conallen.com. This gives extensions to UML suitable for describing Web applications.

A server page ultimately builds the resulting client page. This is a unidirectional relationship, since a completed HTML page has little access to the object interface of the building server page. The stereotype ?builds? is applied to associations and is always drawn in the model as a unidirectional association from a server page to a client page. It indicates which server page is responsible for building a given client page.

The stereotype: ?links? is defined for associations between client pages and other pages (server or client).

Links to server pages may include parameters:

A form identifies a specific web page (almost always one with a server page stereotype) to accept and process data submitted with the form. A ?submits? association stereotype represents the relationship between a form and the web page that processes it,

Exercises


Jan Newmarch (http://pandonia.canberra.edu.au)
jan@ise.canberra.edu.au
Last modified: Tue Aug 1 15:10:28 EST 2000
Copyright ©Jan Newmarch