HTML and HTTP
The World Wide Web is a major client-server system, with probably millions of
users. A site may become a Web server by running an http daemon. A user
becomes a Web client by running a browser such as Netscape. A client can
use any of the servers, and often uses a series of them.
There are a number of servers available. They all use the same protocol for
communication with clients, and they differ in capabilities such as
speed, reliability, etc. Original ones were the CERN server and the NCSA
server. These have given way to servers from Netscape, Microsoft,
O'Reilly, Silicon Graphics, Amaya (WWW Consortium)
The primary purpose of a Web server is to deliver a document on request to
a client. The document may be text, an image file, or other type of file.
The document is identified by a name called a URL (Uniform Resource Locator).
If the server stores that particular URL (or can generate
content for that URL), then it returns the document as
the message reply.
The purpose of a browser is to allow the user to request documents to be
delivered to it, and to display them in some meaningful way.
Browsers differ in the version of HTML they support, in extra features
such as non-standard extensions, email support, the amount of customisation, speed, caching
URLs specify a document access method (a client server protocol),
a server machine and the
location of a document on that machine.
HTML is a markup language defined in SGML (Standarised Generalised
Markup Language). HTML defines a structure to a document without
specifying the details of layout. For example, headers of various
levels are defined. The control over layout of headers could not be
A trivial document looks like
Title of document
<h1> Header level 1 </h1>
Some text in here
An HTML document may contain links to other documents.
When a link is selected, the browser is expected to fetch the new
document and display it in place of the current one.
HTTP is a stateless, connectionless, reliable protocol. Each request
from a client is handled reliably and then the connection is broken.
The current version of HTTP is version 1.1. The previous versions were 0.9
The first line of any message should include the version number as in
If this is not present, version 0.9 is assumed.
HTTP/1.0 servers must handle different versions of request as follows:
recognise the format of the Request-Line for HTTP/0.9 and HTTP/1.0 request.
understand any valid request in the format of HTTP/0.9 or HTTP/1.0
respond appropriately with a message in the same version as the client
HTTP/1.0 clients must
recoghnise the format of the Status-Line for HTTP/1.0 responses.
understand any valid response in the format of HTTP/0.9 and HTTP/1.0
The format of requests from client to server is
Request = Simple-Request | Full-Request
Simple-Request = "GET" SP Request-URI CRLF
Full-Request = Request-Line
A Simple-Request is an HTTP/0.9 request and must be replied to by a
A Request-Line has format
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
Method = "GET" | "HEAD" | POST |
GET http://jan.newmarch.name/index.html HTTP/1.0
A response is of the form
Response = Simple-Response | Full-Response
Simple-Response = [Entity-Body]
Full-Response = Status-Line
The Status-Line gives information about the fate of the request:
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
HTTP/1.0 200 OK
The codes are
Status-Code = "200" ; OK
| "201" ; Created
| "202" ; Accepted
| "204" ; No Content
| "301" ; Moved permanently
| "302" ; Moved temporarily
| "304" ; Not modified
| "400" ; Bad request
| "401" ; Unauthorised
| "403" ; Forbidden
| "404" ; Not found
| "500" ; Internal server error
| "501" ; Not implemented
| "502" ; Bad gateway
| "503" | Service unavailable
The Entity-Header contains useful information about the Entity-Body to
Entity-Header = Allow
If a server wishes the client to authenticate its request, it does so by
first rejecting the request with a "401" message.
As part of this rejection, it should indocate in the "WWW-Authenticate"
field information about the authorisation "realm" so that the client can
determine if it possesses an authorisation for that realm.
The client can then
try again, but this time it includes a user-id and password.
This is not a very secure scheme. All the HTTP messages are sent in
plain text format. The user-id and password are not encrypted in any way.
HTTP 1.1 fixes many problems with HTTP 1.0, but is more complex because
hostname identification (allows virtual hosts)
content negotiation (multiple languages)
persistent connections (reduces TCP overheads - this is very messy)
byte ranges (request parts of documents)
Fatter clients and servers
CGI scripts run on the server side and provide an indefinite amount of
Helpers handle documents on the client browser side that the browser cannot.
It does so by calling another process and passing the document to it.
There is little communication between browser and handler.
Plugins also handle documents that the browser cannot. However, plugins
run wothin the browser address space as DLLs.
Typically they are used for field validation.
Java applets are run by an interpreter within the browser. They can
ActiveX controls are DLLs that run within the browser address space.
They are built from native code and can do anything.
the presentation layer. Java and ActiveX can carry application as well
as presentation logic.
This page is maintained by Jan Newmarch