HTTP

Introduction

The World Wide Web is a major client-server system, with millions of users. A site may become a Web host by running an http server. A user becomes a Web client by running a browser such as Netscape. A client can use any of the servers, and often uses a series of them.

Servers

There are a number of servers available. They all use the same protocol for communication with clients, and they differ in capabilities such as speed, reliability, etc. Original ones were the CERN server and the NCSA server. These have given way to servers from Apache, Netscape, Microsoft, O'Reilly, Silicon Graphics, etc, etc.

The primary purpose of a Web server is to deliver a document on request to a client. The document may be text, an image file, or other type of file. The document is identified by a name called a URL (Uniform Resource Locator). If the server stores that particular URL (or can generate content for that URL), then it returns the document as the message reply.

Browsers

The purpose of a browser is to allow the user to request documents to be delivered to it, and to display them in some meaningful way. Browsers differ in the version of HTML they support, in extra features such as non-standard extensions, email support, the amount of customisation, speed, caching capabilities, etc. Browsers include Netscape, IE, Mozilla, Konqueror, Opera, Lynx, Amaya, etc, etc

URLs

URLs specify a document access method (a client server protocol), a server machine and the location of a document on that machine.
http://pandonia/OS.html
ftp://services.canberra.edu.au/bin/ls

HTTP

Design

HTTP is a stateless, connectionless, reliable protocol. Each request from a client is handled reliably and then the connection is broken. The Web is an excellent example of a set of protocols stretched way beyond their original scope, with a huge series of patches at all levels to try to fix problems.

Versions

There are 3 versions of HTTP

An O/O version was under development to replace HTTP/1.1 but seems to have vanished.

Each version must understand all earlier versions

HTTP 0.9

Request format

Request = Simple-Request

Simple-Request = "GET" SP Request-URI CRLF

Response format

A response is of the form
Response = Simple-Response

Simple-Response = [Entity-Body]

HTTP 1.0

This version added much more information to the requests and responses. Rather than "grow" the 0.9 format, it was just left alongside the new version.

Request format

The format of requests from client to server is
Request = Simple-Request | Full-Request

Simple-Request = "GET" SP Request-URI CRLF

Full-Request = Request-Line
		*(General-Header
		| Request-Header
		| Entity-Header)
		CRLF
		[Entity-Body]
A Simple-Request is an HTTP/0.9 request and must be replied to by a Simple-Response.

A Request-Line has format

Request-Line = Method SP Request-URI SP HTTP-Version CRLF
where
Method = "GET" | "HEAD" | POST |
	 extension-method
e.g.
GET http://jan.newmarch.name/index.html HTTP/1.0

Response format

A response is of the form
Response = Simple-Response | Full-Response

Simple-Response = [Entity-Body]

Full-Response = Status-Line
		*(General-Header 
		| Response-Header
		| Entity-Header)
		CRLF
		[Entity-Body]

The Status-Line gives information about the fate of the request:

Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
e.g.
HTTP/1.0 200 OK
The codes are
Status-Code =	  "200" ; OK
		| "201" ; Created
		| "202" ; Accepted
		| "204" ; No Content
		| "301" ; Moved permanently
		| "302" ; Moved temporarily
		| "304" ; Not modified
		| "400" ; Bad request
		| "401" ; Unauthorised
		| "403" ; Forbidden
		| "404" ; Not found
		| "500" ; Internal server error
		| "501" ; Not implemented
		| "502" ; Bad gateway
		| "503" | Service unavailable
		| extension-code

The Entity-Header contains useful information about the Entity-Body to follow

Entity-Header =	Allow
		| Content-Encoding
		| Content-Length
		| Content-Type
		| Expires
		| Last-Modified
		| extension-header
For example
HTTP/1.1 200 OK
Date: Fri, 29 Aug 2003 00:59:56 GMT
Server: Apache/2.0.40 (Unix)
Accept-Ranges: bytes
Content-Length: 1595
Connection: close
Content-Type: text/html; charset=ISO-8859-1

HTTP 1.1

HTTP 1.1 fixes many problems with HTTP 1.0, but is more complex because of it. This version is done by extending or refining the options available to HTTP 1.0. e.g. The changes include The 0.9 protocol took one page. The 1.0 protocol was described in about 20 pages. 1.1 takes 120 pages.

Character set

HTTP 1.1 Requests

The set of requests has been expanded to

Content negotiation

Dates

Authentication

If a server wishes the client to authenticate its request, it does so by first rejecting the request with a "401" message. As part of this rejection, it should indicate in the "WWW-Authenticate" field information about the authorisation "realm" so that the client can determine if it possesses an authorisation for that realm. The client can then try again, but this time it includes a user-id and password.

This is not a very secure scheme. All the HTTP messages are sent in plain text format. The user-id and password are not encrypted in any way.

POST versus GET

"Normal" queries use GET. Strictly, if a request is "idempotent" it should use GET. Idempotent means that the client is not asking for a state change in the server, and would expect a repeat request to return the same result. This is the norm for static document requests

GET http://localhost/index.html

GET should also be used for idempotent form requests. Again, these are ones that do not cause any (visible) change of state.

GET http://localhost/cgi-bin/test-cgi?name=jan
Parameters are passed after a '?', in the form vbl=value. Any problematic characters have to be escaped. e.g. space is written as its Ascii value in hex as '%20' (or '+'). GET url's can become very long. They can also be a security leak since the form data is visible in the url and is often saved in bookmarks, log files, etc.

Note that a GET request that e.g. increases a count of logins to the server is still regarded as idempotent since it is not visible to the client.

Queries may be intended to result in state changes on the server. e.g. uploading a file, confirming a transaction, etc. These queries should use POST, and include form data in the content part of the message.

SOAP (see later) is criticised for forcing use of POST even for idempotent queries.


This page is maintained by Jan Newmarch http://jan.newmarch.name
Copyright © Jan Newmarch, Monash University, 2007
Creative Commons License This work is licensed under a Creative Commons License
The moral right of Jan Newmarch to be identified as the author of this page has been asserted.