HTML and HTTP

Introduction

The World Wide Web is a major client-server system, with probably millions of users. A site may become a Web server by running an http daemon. A user becomes a Web client by running a browser such as Netscape. A client can use any of the servers, and often uses a series of them.

Servers

There are a number of servers available. They all use the same protocol for communication with clients, and they differ in capabilities such as speed, reliability, etc. Original ones were the CERN server and the NCSA server. These have given way to servers from Netscape, Microsoft, O'Reilly, Silicon Graphics, Amaya (WWW Consortium) etc.

The primary purpose of a Web server is to deliver a document on request to a client. The document may be text, an image file, or other type of file. The document is identified by a name called a URL (Uniform Resource Locator). If the server stores that particular URL (or can generate content for that URL), then it returns the document as the message reply.

Browsers

The purpose of a browser is to allow the user to request documents to be delivered to it, and to display them in some meaningful way. Browsers differ in the version of HTML they support, in extra features such as non-standard extensions, email support, the amount of customisation, speed, caching capabilities, etc.

URLs

URLs specify a document access method (a client server protocol), a server machine and the location of a document on that machine.
http://pandonia/OS.html
ftp://services.canberra.edu.au/bin/ls

HTML

Document structure

HTML is a markup language defined in SGML (Standarised Generalised Markup Language). HTML defines a structure to a document without specifying the details of layout. For example, headers of various levels are defined. The control over layout of headers could not be specified originally.

A trivial document looks like

<html>
<head>
<title>
Title of document
</title>
</head>

<body>
<h1> Header level 1 </h1>
Some text in here
</body>

Hypertext links

An HTML document may contain links to other documents. When a link is selected, the browser is expected to fetch the new document and display it in place of the current one.

HTTP

Design

HTTP is a stateless, connectionless, reliable protocol. Each request from a client is handled reliably and then the connection is broken.

Versions

The current version of HTTP is version 1.1. The previous versions were 0.9 and 1.0. The first line of any message should include the version number as in
HTTP/1.1
If this is not present, version 0.9 is assumed.

HTTP/1.0 servers must handle different versions of request as follows:

HTTP/1.0 clients must

  • recoghnise the format of the Status-Line for HTTP/1.0 responses.
  • understand any valid response in the format of HTTP/0.9 and HTTP/1.0

    Request format

    The format of requests from client to server is
    Request = Simple-Request | Full-Request
    
    Simple-Request = "GET" SP Request-URI CRLF
    
    Full-Request = Request-Line
    		*(General-Header
    		| Request-Header
    		| Entity-Header)
    		CRLF
    		[Entity-Body]
    
    A Simple-Request is an HTTP/0.9 request and must be replied to by a Simple-Response.

    A Request-Line has format

    Request-Line = Method SP Request-URI SP HTTP-Version CRLF
    
    where
    Method = "GET" | "HEAD" | POST |
    	 extension-method
    
    e.g.
    GET http://jan.newmarch.name/index.html HTTP/1.0
    

    Response format

    A response is of the form
    Response = Simple-Response | Full-Response
    
    Simple-Response = [Entity-Body]
    
    Full-Response = Status-Line
    		*(General-Header 
    		| Response-Header
    		| Entity-Header)
    		CRLF
    		[Entity-Body]
    

    The Status-Line gives information about the fate of the request:

    Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
    
    e.g.
    HTTP/1.0 200 OK
    
    The codes are
    Status-Code =	  "200" ; OK
    		| "201" ; Created
    		| "202" ; Accepted
    		| "204" ; No Content
    		| "301" ; Moved permanently
    		| "302" ; Moved temporarily
    		| "304" ; Not modified
    		| "400" ; Bad request
    		| "401" ; Unauthorised
    		| "403" ; Forbidden
    		| "404" ; Not found
    		| "500" ; Internal server error
    		| "501" ; Not implemented
    		| "502" ; Bad gateway
    		| "503" | Service unavailable
    		| extension-code
    

    The Entity-Header contains useful information about the Entity-Body to follow

    Entity-Header =	Allow
    		| Content-Encoding
    		| Content-Length
    		| Content-Type
    		| Expires
    		| Last-Modified
    		| extension-header
    

    Authentication

    If a server wishes the client to authenticate its request, it does so by first rejecting the request with a "401" message. As part of this rejection, it should indocate in the "WWW-Authenticate" field information about the authorisation "realm" so that the client can determine if it possesses an authorisation for that realm. The client can then try again, but this time it includes a user-id and password.

    This is not a very secure scheme. All the HTTP messages are sent in plain text format. The user-id and password are not encrypted in any way.

    HTTP 1.1

    HTTP 1.1 fixes many problems with HTTP 1.0, but is more complex because of it.

    Fatter clients and servers

    CGI scripts run on the server side and provide an indefinite amount of server-side processing.

    Helpers handle documents on the client browser side that the browser cannot. It does so by calling another process and passing the document to it. There is little communication between browser and handler.

    Plugins also handle documents that the browser cannot. However, plugins run wothin the browser address space as DLLs.

    JavaScript and VBScript are run by interpreters within the browser. Typically they are used for field validation.

    Java applets are run by an interpreter within the browser. They can accomplish far more than JavaScript or VBScript.

    ActiveX controls are DLLs that run within the browser address space. They are built from native code and can do anything.

    CGI scripts fatten the server. JavaScript and VBScript really add to the presentation layer. Java and ActiveX can carry application as well as presentation logic.


    This page is maintained by Jan Newmarch http://jan.newmarch.name