HTTP

Introduction

The World Wide Web is a major distributed system, with millions of users. A site may become a Web host by running an HTTP server. While Web clients are typically users with a browser, there are many other "user agents" such as web spiders, web application clients and so on.

The Web is built on top of the HTTP (Hyper-Text Transport Protocol) which is layered on top of TCP. HTTP has been through three publically available versions, but the latest - version 1.1 - is now the most commonly used.

In this chapter we give an overview of HTTP, followed by the Go APIs to manage HTTP connections.

Overview of HTTP

URLs and resources

URLs specify the location of a resource. A resource is often a static file, such as an HTML document, an image, or a sound file. But increasingly, it may be a dynamically generated object, perhaps based on information stored in a database.

When a user agent requests a resource, what is returned is not the resource itself, but some representation of that resource. For example, if the resource is a static file, then what is sent to the user agent is a copy of the file.

Multiple URLs may point to the same resource, and an HTTP server will return appropriate representations of the resource for each URL. For example, an company might make product information available both internally and externally using different URLs for the same product. The internal representation of the product might include information such as internal contact officers for the product, while the external representation might include the location of stores selling the product.

This view of resources means that the HTTP protocol can be fairly simple and straightforward, while an HTTP server can be arbitrarily complex. HTTP has to deliver requests from user agents to servers and return a byte stream, while a server might have to do any amount of processing of the request.

HTTP characteristics

HTTP is a stateless, connectionless, reliable protocol. In the simplest form, each request from a user agent is handled reliably and then the connection is broken. Each request involves a separate TCP connection, so if many reources are required (such as images embedded in an HTML page) then many TCP connections have to be set up and torn down in a short space of time.

Thera are many optimisations in HTTP which add complexity to the simple structure, in order to create a more efficient and reliable protocol.

Versions

There are 3 versions of HTTP

Each version must understand requests and responses of earlier versions.

HTTP 0.9

Request format

Request = Simple-Request

Simple-Request = "GET" SP Request-URI CRLF
  

Response format

A response is of the form

Response = Simple-Response

Simple-Response = [Entity-Body]
  

HTTP 1.0

This version added much more information to the requests and responses. Rather than "grow" the 0.9 format, it was just left alongside the new version.

Request format

The format of requests from client to server is

Request = Simple-Request | Full-Request

Simple-Request = "GET" SP Request-URI CRLF

Full-Request = Request-Line
		*(General-Header
		| Request-Header
		| Entity-Header)
		CRLF
		[Entity-Body]
A Simple-Request is an HTTP/0.9 request and must be replied to by a Simple-Response.

A Request-Line has format

Request-Line = Method SP Request-URI SP HTTP-Version CRLF
where
Method = "GET" | "HEAD" | POST |
	 extension-method
e.g.
GET http://jan.newmarch.name/index.html HTTP/1.0

Response format

A response is of the form

Response = Simple-Response | Full-Response

Simple-Response = [Entity-Body]

Full-Response = Status-Line
		*(General-Header 
		| Response-Header
		| Entity-Header)
		CRLF
		[Entity-Body]

The Status-Line gives information about the fate of the request:

Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
e.g.
HTTP/1.0 200 OK
The codes are
Status-Code =	  "200" ; OK
		| "201" ; Created
		| "202" ; Accepted
		| "204" ; No Content
		| "301" ; Moved permanently
		| "302" ; Moved temporarily
		| "304" ; Not modified
		| "400" ; Bad request
		| "401" ; Unauthorised
		| "403" ; Forbidden
		| "404" ; Not found
		| "500" ; Internal server error
		| "501" ; Not implemented
		| "502" ; Bad gateway
		| "503" | Service unavailable
		| extension-code

The Entity-Header contains useful information about the Entity-Body to follow

Entity-Header =	Allow
		| Content-Encoding
		| Content-Length
		| Content-Type
		| Expires
		| Last-Modified
		| extension-header
For example
HTTP/1.1 200 OK
Date: Fri, 29 Aug 2003 00:59:56 GMT
Server: Apache/2.0.40 (Unix)
Accept-Ranges: bytes
Content-Length: 1595
Connection: close
Content-Type: text/html; charset=ISO-8859-1

HTTP 1.1

HTTP 1.1 fixes many problems with HTTP 1.0, but is more complex because of it. This version is done by extending or refining the options available to HTTP 1.0. e.g.

The changes include The 0.9 protocol took one page. The 1.0 protocol was described in about 20 pages. 1.1 takes 120 pages.

Simple user-agents

User agents such as browsers make requests and get responses. The response type is

type Response struct {
    Status     string // e.g. "200 OK"
    StatusCode int    // e.g. 200
    Proto      string // e.g. "HTTP/1.0"
    ProtoMajor int    // e.g. 1
    ProtoMinor int    // e.g. 0

    RequestMethod string // e.g. "HEAD", "CONNECT", "GET", etc.

    Header map[string]string

    Body io.ReadCloser

    ContentLength int64

    TransferEncoding []string

    Close bool

    Trailer map[string]string
}
    

We shall examine this data structure through examples. The simplest request is from a user agent is "HEAD" which asks for information about a resource and its HTTP server. The function

func Head(url string) (r *Response, err os.Error)
    
can be used to make this query.

The status of the response is in the response field Status, while the field Header is a map of the header fields in the HTTP response. A program to make this request and display the results is


/* Head
*/

package main

import ("fmt"; "http"; "os")

func main() {
	if len(os.Args) != 2 {
                fmt.Println("Usage: ", os.Args[0], "host:port")
                os.Exit(1)
        }
        url := os.Args[1]

	response, err := http.Head(url)
	if err != nil {
		fmt.Println(err.String())
		os.Exit(2)
	}
	
	fmt.Println(response.Status)
	for k, v := range(response.Header) {
		fmt.Println(k + ":", v)
	}

	os.Exit(0)
}

When run against a resource as in Head http://www.golang.com/ it prints something like

200 OK
Content-Type: text/html; charset=utf-8
Date: Tue, 14 Sep 2010 05:34:29 GMT
Cache-Control: public, max-age=3600
Expires: Tue, 14 Sep 2010 06:34:29 GMT
Server: Google Frontend
    

Usually, we are want to retrieve a resource rather than just get information about it. The "GET" request will do this, and this can be done using

func Get(url string) (r *Response, finalURL string, err os.Error)
    

The content of the response is in the response field Body which is of type io.ReadCloser. We can print the content to the screen with the following program


/* Get
*/

package main

import ("fmt"; "http"; "os"; "strings")

func main() {
	if len(os.Args) != 2 {
                fmt.Println("Usage: ", os.Args[0], "host:port")
                os.Exit(1)
        }
        url := os.Args[1]

	response, final, err := http.Get(url)
	if err != nil {
		fmt.Println(err.String())
		os.Exit(2)
	}
	fmt.Println("Final url", final)

	if response.Status != "200 OK" {
		fmt.Println(response.Status)
		os.Exit(2)
	}

	b, _ := http.DumpResponse(response, false)
	fmt.Print(string(b))

	charset := response.Header["Content-Type"]
	if !acceptableCharset(charset) {
		fmt.Println("Cannot handle", charset)
		os.Exit(4)
	}

	var buf [512]byte
	reader := response.Body
	for {
		n, err := reader.Read(buf[0:])
		if err != nil {
			os.Exit(0)
		}
		fmt.Print(string(buf[0: n]))
	}
	os.Exit(0)
}

func acceptableCharset(charset string) bool {
	if strings.HasSuffix(charset, "UTF-8") {
		return true
	}
	return false
}

Note that there are important character set issues of the type discussed in the previous chapter. The server will deliver the content using some character set encoding, and possibly some transfer encoding. Usually this is a matter of negotiation between user agent and server, but the simple Get command that we are using does not include the user agent component of the negotiation. So the server can send whatever character encoding it wishes.

At the time of writing, I am in China. When I try this program on www.google.com, Google's server tries to be helpful by guessing my location and sending me the text in the Chinese character set Big5! How to tell the server what character encoding is okay for me is discussed later.

Lower-level user-agents

Go also supplies a lower-level interface for user agents to communicate with HTTP servers. As you might expect, not only does it give you more control over the client requests, but requires you to spend more effort in building the requests.

The data type used to build requests is the type Request. This is a complex type, and is given in the Go documentation as

type Request struct {
    Method     string // GET, POST, PUT, etc.
    RawURL     string // The raw URL given in the request.
    URL        *URL   // Parsed URL.
    Proto      string // "HTTP/1.0"
    ProtoMajor int    // 1
    ProtoMinor int    // 0


    // A header maps request lines to their values.
    // If the header says
    //
    //	accept-encoding: gzip, deflate
    //	Accept-Language: en-us
    //	Connection: keep-alive
    //
    // then
    //
    //	Header = map[string]string{
    //		"Accept-Encoding": "gzip, deflate",
    //		"Accept-Language": "en-us",
    //		"Connection": "keep-alive",
    //	}
    //
    // HTTP defines that header names are case-insensitive.
    // The request parser implements this by canonicalizing the
    // name, making the first character and any characters
    // following a hyphen uppercase and the rest lowercase.
    Header map[string]string

    // The message body.
    Body io.ReadCloser

    // ContentLength records the length of the associated content.
    // The value -1 indicates that the length is unknown.
    // Values >= 0 indicate that the given number of bytes may be read from Body.
    ContentLength int64

    // TransferEncoding lists the transfer encodings from outermost to innermost.
    // An empty list denotes the "identity" encoding.
    TransferEncoding []string

    // Whether to close the connection after replying to this request.
    Close bool

    // The host on which the URL is sought.
    // Per RFC 2616, this is either the value of the Host: header
    // or the host name given in the URL itself.
    Host string

    // The referring URL, if sent in the request.
    //
    // Referer is misspelled as in the request itself,
    // a mistake from the earliest days of HTTP.
    // This value can also be fetched from the Header map
    // as Header["Referer"]; the benefit of making it
    // available as a structure field is that the compiler
    // can diagnose programs that use the alternate
    // (correct English) spelling req.Referrer but cannot
    // diagnose programs that use Header["Referrer"].
    Referer string

    // The User-Agent: header string, if sent in the request.
    UserAgent string

    // The parsed form. Only available after ParseForm is called.
    Form map[string][]string

    // Trailer maps trailer keys to values.  Like for Header, if the
    // response has multiple trailer lines with the same key, they will be
    // concatenated, delimited by commas.
    Trailer map[string]string
}
    

There is a lot of information that can be stored in a request. You do not need to fill in all fileds, only those of interest. For example, to specify that you only wish to receive particular character sets, you can add an entry for Accept-Charset to the Header map. This is illustrated in the following program


/* LowLevelGet
*/

package main

import ("fmt"; "http"; "os"; "strings"; "net")

func main() {
	if len(os.Args) != 2 {
                fmt.Println("Usage: ", os.Args[0], "http://host:port/page")
                os.Exit(1)
        }
        url, err := http.ParseURL(os.Args[1])
	checkError(err)

	// build a TCP connection first
	host := url.Host
	conn, err := net.Dial("tcp", "", host)
        checkError(err)

	// then wrap an HTTP client connection around it
	clientConn := http.NewClientConn(conn, nil)
	if clientConn == nil {
		fmt.Println("Can't build connection")
		os.Exit(1)
	}

	// define the additional HTTP header fields
	header := map[string] string {"Accept-Charset": "UTF-8;q=1, ISO-8859-1;q=0.1"}
	// and build the request
	request := http.Request{Method: "GET", URL: url, Header: header}
	dump, _ := http.DumpRequest(&request, false)
	fmt.Println(string(dump))

	// send the request
	err = clientConn.Write(&request)
	checkError(err)

	// and get the response
	response, err := clientConn.Read()
	checkError(err)

	if response.Status != "200 OK" {
		fmt.Println(response.Status)
		os.Exit(2)
	}

	charset := response.Header["Content-Type"]
	if !acceptableCharset(charset) {
		fmt.Println("Cannot handle", charset)
		os.Exit(4)
	}

	var buf [512]byte
	reader := response.Body
	for {
		n, err := reader.Read(buf[0:])
		if err != nil {
			os.Exit(0)
		}
		fmt.Print(string(buf[0: n]))
	}

	os.Exit(0)
}

func acceptableCharset(charset string) bool {
	if strings.HasSuffix(charset, "UTF-8") {
		return true
	}
	return false
}

func checkError(err os.Error) {
        if err != nil {
                fmt.Println("Fatal error ", err.String())
                os.Exit(1)
        }
}
except it doesn't seem to work on www.google.com.hk

Simple servers

The other side to building a client is a Web server handling HTTP requests. The simplest - and earliest - servers just returned copies of files. However, any URL can now trigger an arbitrary computation in current servers.

File server

We start with a basic file server. Go supplies a multi-plexer, that is, an object that will read and interpret requests. It hands out requests to handlers which run in their own thread. Thus much of the work of reading HTTP requests, decoding them and branching to suitable functions in their own thread is done for us.

For a file server, Go also gives a FileServer object which knows how to deliver files from the local file system. It takes a "root" directory which is the top of a file tree in the local system, and a pattern to match URLs against. The simplest pattern is "/" which is the top of any URL. This will match all URLs.

An HTTP server delivering files from the local file system is almost embarrassingly trivial given these objects. It is


/* File Server
*/

package main

import ("fmt"; "http"; "os")

func main() {
	// deliver files from the directory /var/www 
	fileServer := http.FileServer("/var/www", "/")

	// register the handler and deliver requests to it
	err := http.ListenAndServe(":8000", fileServer)
	checkError(err)
	// That's it!
}

func checkError(err os.Error) {
        if err != nil {
                fmt.Println("Fatal error ", err.String())
                os.Exit(1)
        }
}
This server even delivers "404 not found" messages for requests for file resources that don't exist!

Handler functions

In this last program, the handler was given in the second argument to ListenAndServe. Any number of handlers can be registered first by calls to Handle or handleFunc, with signatures

func Handle(pattern string, handler Handler)
func HandleFunc(pattern string, handler func(*Conn, *Request))
    

The second argument to HandleAndServe could be nil, and then calls are dispatched to all registered handlers. Each handler shold have a different URL pattern. For example, the file handler might have URL pattern "/" while a function handler might have URL pattern "/cgi-bin". A more specific pattern takes precedence over a more general pattern.

Common CGI programs are test-cgi (written in the shell) or printenv (written in Perl) which print the values of the environment variables. A handler can be written to work in a similar manner.


/* Print Env
*/

package main

import ("fmt"; "http"; "os")

func main() {
	// file handler for most files
	fileServer := http.FileServer("/var/www", "/")
	http.Handle("/", fileServer)

	// function handler for /cgi-bin/printenv
	http.HandleFunc("/cgi-bin/printenv", printEnv)

	// deliver requests to the handlers
	err := http.ListenAndServe(":8000", nil)
	checkError(err)
	// That's it!
}

func printEnv(writer http.ResponseWriter, req *http.Request) {
	env := os.Environ()
	writer.Write([]byte("<h1>Environment</h1>\n<pre>"))
	for _, v := range(env) {
		writer.Write([]byte(v+"\n"))
	}
	writer.Write([]byte("</pre>"))
}

func checkError(err os.Error) {
        if err != nil {
                fmt.Println("Fatal error ", err.String())
                os.Exit(1)
        }
}
Note: for simplicity this program does not deliver well-formed HTML. It is missing html, head and body tags.

Using the cgi-bin directory in this program is a bit cheeky: it doesn't call an external program like CGI scripts do. It just calls a Go function. Go does have the ability to call external programs using os.ForkExec, but does not yet have support for dynamically linkable modules like Apache's mod_perl

Low-level servers

Go also supplies a lower-level interface for servers. Again, this means that as the programmer you have to do more work. You first make a TCP server, and then wrap a ServerConn around it. Then you read Request's and write Response's.

A basic server is


/* LowLevel Server
*/

package main

import ("fmt"; "http"; "os"; "net")

func main() {

	// create a TCP server first
	listener, err := net.Listen("tcp", ":8000")
        checkError(err)

	// then wrap an HTTP client connection around it

	for {
		conn, err := listener.Accept()
		if err != nil {
			continue
		}
		fmt.Println("Accepted")
		serverConn := http.NewServerConn(conn, nil)
		if serverConn == nil {
			fmt.Println("Can't build connection")
			continue
		}

		req, err := serverConn.Read()
		if err != nil {
			fmt.Println(err.String())
			continue
		}
		http.DumpRequest(req, false)
	}
	os.Exit(0)
}

func checkError(err os.Error) {
        if err != nil {
                fmt.Println("Fatal error ", err.String())
                os.Exit(1)
        }
}
doesn't work yet - closes connection. No tests in Go distro

Proxy handling

Simple proxy

HTTP 1.1 laid out how HTTP should work through a proxy. A "GET" request should be made to a proxy. However, the URL requested should be the full URL of the destination. As long as the proxy is configured to pass such requests through, then that is all that needs to be done.

To do this in Go requires use of the low-level HTTP API. You first need to create a TCP connection to the proxy server. Then you prepare a request that contains the destination URL In the Go code you need to set two parameters in the Request: the RawURL which is the URL as a string, and the parsed URL URL

The following program illustrates this:


/* LowLevelGet
*/

package main

import ("fmt"; "http"; "os"; "net")

func main() {
	if len(os.Args) != 3 {
                fmt.Println("Usage: ", os.Args[0], "proxy-host:port http://host:port/page")
                os.Exit(1)
        }
	proxy := os.Args[1]
	rawURL := os.Args[2]
	url, err := http.ParseURL(rawURL)
	checkError(err)

	// build a TCP connection first
	conn, err := net.Dial("tcp", "", proxy)
        checkError(err)

	// then wrap an HTTP client connection around it
	clientConn := http.NewClientConn(conn, nil)
	if clientConn == nil {
		fmt.Println("Can't build connection")
		os.Exit(1)
	}

	request := http.Request{Method: "GET", RawURL: rawURL, URL: url}
	dump, _ := http.DumpRequest(&request, false)
	fmt.Println(string(dump))

	// send the request
	err = clientConn.Write(&request)
	checkError(err)

	fmt.Println("Write ok")
	// and get the response
	response, err := clientConn.Read()
	checkError(err)
	fmt.Println("Read ok")

	if response.Status != "200 OK" {
		fmt.Println(response.Status)
		os.Exit(2)
	}
	fmt.Println("Reponse ok")

	var buf [512]byte
	reader := response.Body
	for {
		n, err := reader.Read(buf[0:])
		if err != nil {
			os.Exit(0)
		}
		fmt.Print(string(buf[0: n]))
	}

	os.Exit(0)
}

func checkError(err os.Error) {
        if err != nil {
		if err == os.EOF {return}
                fmt.Println("Fatal error ", err.String())
                //os.Exit(1)
        }
}

if you don't have a suitable proxy to test this, then download and install the Squid proxy to your own computer.

Authenticating proxy

Some proxies will require authentication, by a user name and password in order to pass requests. A common scheme is "basic authentication" in which the user name and password are concatenated into a string "user:password" and then BASE64 encoded. This is then given to the proxy by the HTTP request header "Proxy-Authorisation" with the flag that it is the basic authentication

The following program illlustrates this


/* ProxyAuthGet
*/

package main

import ("fmt"; "http"; "os"; "net"; "encoding/base64")

const auth = "jannewmarch:mypassword"

func main() {
	if len(os.Args) != 3 {
                fmt.Println("Usage: ", os.Args[0], "proxy-host:port http://host:port/page")
                os.Exit(1)
        }
	proxy := os.Args[1]
	rawURL := os.Args[2]
	url, err := http.ParseURL(rawURL)
	checkError(err)

	// build a TCP connection first
	conn, err := net.Dial("tcp", "", proxy)
        checkError(err)

	// then wrap an HTTP client connection around it
	clientConn := http.NewClientConn(conn, nil)
	if clientConn == nil {
		fmt.Println("Can't build connection")
		os.Exit(1)
	}

	// encode the auth
	var encBytes []byte
	base64.StdEncoding.Encode(encBytes, []byte(auth))
	basic := "Basic " + string(encBytes)
	header := map[string] string {"Proxy-Authentication": basic}

	request := http.Request{Method: "GET", 
	                        RawURL: rawURL, 
	                        URL: url, 
                                Header: header}
	dump, _ := http.DumpRequest(&request, false)
	fmt.Println(string(dump))

	// send the request
	err = clientConn.Write(&request)
	checkError(err)

	fmt.Println("Write ok")
	// and get the response
	response, err := clientConn.Read()
	checkError(err)
	fmt.Println("Read ok")

	if response.Status != "200 OK" {
		fmt.Println(response.Status)
		os.Exit(2)
	}
	fmt.Println("Reponse ok")

	var buf [512]byte
	reader := response.Body
	for {
		n, err := reader.Read(buf[0:])
		if err != nil {
			os.Exit(0)
		}
		fmt.Print(string(buf[0: n]))
	}

	os.Exit(0)
}

func checkError(err os.Error) {
        if err != nil {
		if err == os.EOF {return}
                fmt.Println("Fatal error ", err.String())
                //os.Exit(1)
        }
}

[NOT YET TESTED]

HTTPS

For secure, encrypted connections, HTTP uses TLS which is described in the chapter on security. The protocol of HTTP+TLS is called HTTPS and uses http:// urls instead of http:// urls.

For a server to use HTTPS, it needs an X.509 certificate and a private key file for that certificate. Go at present requires that these be PEM-encoded. Then the HTTP function ListenAndServe is replaced by the HTTPS (HTTP+TLS) function ListenAndServeTLS.

The file server program given earlier can be written as an HTTPS server as


/* File Server
*/

package main

import ("fmt"; "http"; "os")

func main() {
	// deliver files from the directory /var/www 
	fileServer := http.FileServer("/var/www", "/")

	// register the handler and deliver requests to it
	err := http.ListenAndServeTLS(":8000", "jan.newmarch.name.pem",
		"private.pem", fileServer)
	checkError(err)
	// That's it!
}

func checkError(err os.Error) {
        if err != nil {
                fmt.Println("Fatal error ", err.String())
                os.Exit(1)
        }
}
This server is accessed by e.g. https://localhost:8000/index.html

If you want a server that supports both HTTP and HTTPs, run each listener in its own goroutine.