HTTP

General

Introduction

The Web is built on top of the HTTP (Hyper-Text Transport Protocol) which is layered on top of a transport protocol such as TCP. HTTP has been through five publically available versions. Version 2 is now used by over 40% of websites ( Usage statistics of HTTP/2 for websites ). HttP3 is currently used by only 6% of websites ( Usage statistics of HTTP/3 for websites ), but there is support by sigifcant vendors such as Google and Cloudflare, and browser support from Google Chrome and Mozilla Firefox ( HTTP/3: the past, the present, and the future ). The Web Almanac gives similar figures for 2019 Part IV Chapter 20: HTTP/2

In this chapter we give an overview of HTTP, and later chapters look at languages bindings for clients and servers.

URLs and resources

URLs specify the location of a resource. A resource is often a static file, such as an HTML document, an image, or a sound file. But increasingly, it may be a dynamically generated object, perhaps based on information stored in a database.

When a user agent requests a resource, what is returned is not the resource itself, but some representation of that resource. For example, if the resource is a static file, then what is sent to the user agent is a copy of the file.

Multiple URLs may point to the same resource, and an HTTP server will return appropriate representations of the resource for each URL. For example, an company might make product information available both internally and externally using different URLs for the same product. The internal representation of the product might include information such as internal contact officers for the product, while the external representation might include the location of stores selling the product.

This view of resources means that the HTTP protocol can be fairly simple and straightforward, while an HTTP server can be arbitrarily complex. HTTP has to deliver requests from user agents to servers and return a byte stream, while a server might have to do any amount of processing of the request.

HTTP characteristics

HTTP is a stateless, connectionless, reliable protocol. In the simplest form, each request from a user agent is handled reliably and then the connection is broken.

In the earliest version of HTTP, each request involved a separate TCP connection, so if many resources were required (such as images embedded in an HTML page) then many TCP connections had to be set up and torn down in a short space of time.

HTTP/1.1 added many optimisations in HTTP which added complexity to the simple structure, but created a more efficient and reliable protocol. HTTP/2 has adopted a binary form for further efficienct gains.

Versions

There are 5 versions of HTTP

Version 0.9 - totally obsolete
Version 1.0 - almost obsolete
Version 1.1 - was the most popular version, declining now
Version 2 - the latest version, on par with version 1.1. and gaining
Version 3 - still under development but already getting traction

Each version must understand requests and responses of earlier versions.

HTTP/0.9

Request format

	Request = Simple-Request

	Simple-Request = "GET" SP Request-URI CRLF

Response format

A response is of the form

	Response = Simple-Response

	Simple-Response = [Entity-Body]

HTTP/1.0

This version added much more information to the requests and responses. Rather than "grow" the 0.9 format, it was just left alongside the new version.

Request format

The format of requests from client to server is

	Request = Simple-Request | Full-Request

	Simple-Request = "GET" SP Request-URI CRLF

	Full-Request = Request-Line
        *(General-Header
        | Request-Header
        | Entity-Header)
        CRLF
        [Entity-Body]

A Simple-Request is an HTTP/0.9 request and must be replied to by a Simple-Response.

A Request-Line has format

	Request-Line = Method SP Request-URI SP HTTP-Version CRLF

where

	Method = "GET" | "HEAD" | POST |
        extension-method

e.g.

	GET http://jan.newmarch.name/index.html HTTP/1.0

Response format

A response is of the form

	Response = Simple-Response | Full-Response

	Simple-Response = [Entity-Body]

	Full-Response = Status-Line
        *(General-Header 
        | Response-Header
        | Entity-Header)
        CRLF
        [Entity-Body]

The Status-Line gives information about the fate of the request:

	Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF

e.g.

	HTTP/1.0 200 OK

The codes are

	Status-Code =     "200" ; OK
        | "201" ; Created
        | "202" ; Accepted
        | "204" ; No Content
        | "301" ; Moved permanently
        | "302" ; Moved temporarily
        | "304" ; Not modified
        | "400" ; Bad request
        | "401" ; Unauthorised
        | "403" ; Forbidden
        | "404" ; Not found
        | "500" ; Internal server error
        | "501" ; Not implemented
        | "502" ; Bad gateway
        | "503" | Service unavailable
        | extension-code

The Entity-Header contains useful information about the Entity-Body to follow

	Entity-Header = Allow
        | Content-Encoding
        | Content-Length
        | Content-Type
        | Expires
        | Last-Modified
        | extension-header

For example

	HTTP/1.1 200 OK
	Date: Fri, 29 Aug 2003 00:59:56 GMT
	Server: Apache/2.0.40 (Unix)
	Accept-Ranges: bytes
	Content-Length: 1595
	Connection: close
	Content-Type: text/html; charset=ISO-8859-1

HTTP/1.1

HTTP/1.1 fixes many problems with HTTP/1.0, but is more complex because of it. This version is done by extending or refining the options available to HTTP/1.0. e.g.

there are more commands such as TRACE and CONNECT
HTTP/1.1 tightened up the rules for the request URLs to allow proxy handling. If the request is directed through a proxy, the URL should be an absolute URL as in GET http://www.w3.org/index.html HTTP/1.1 Otherwise an absolute path shold be used, and include a Host header field as in GET /index.html HTTP/1.1 Host: www.w3.org
there are more attributes such as If-Modified-Since, also for use by proxies

The changes include

hostname identification (allows virtual hosts)
content negotiation (multiple languages)
persistent connections (reduces TCP overheads - this is very complex)
chunked transfers
byte ranges (request parts of documents)
proxy support

HTTP/2

All the earlier versions of HTTP are text-based. The most significant departure for HTTP/2 is that it is a binary format. In order to ensure backwards compatability this can't be managed by sending a binary message to an older server to see what it does. Instead an HTTP/1.1 message is sent with extra attributes, essentially asking if the server wants to switch to HTTP/2. If it doesn't understand the extra fields it replies with a normal HTTP/1.1 response and the session continues with HTTP/1.1.

Otherwise the server can respond that it is willing to change, and the session can continue with HTTP/2.

HTTP/3

HTTP/2 uses a binary format and also can carry multiple streams within a single TCP connection. While usually speeding up the web, the use of a single TCP stream has one significant problem called the 'head of line' problem: if one packet is held up, lost or whatever, all streams come to a halt. This has been solved by protocols such as SCTP but they haven't gained wide acceptance on the internet: with so many uncontrolled hosts, both servers and clients, there is no way a new level 4 protocol is going to be widely accepted.

So basically, we are stuck with two 50 year old protocols: TCP and UDP. QUIC is a user-space protocol built on top of UDP. HTTP/3 uses QUIC as its transport protocol. Since it is in user space, it is easy to embed it in applications and libraries, without having to add it to kernels everywhere.

QUIC 'connections,' using UDP, are not compatable with TCP. If a QUIC connection fails, it needs to drop back to using TCP. So HTTP/2 will still be needed.

The 0.9 protocol took one page. The 1.0 protocol was described in about 20 pages. 1.1 takes 120 pages, while HTTP/2 takes about 96 pages and HTTP/3 takes a further 60 pages.

Resources

Copyright © Jan Newmarch, jan@newmarch.name

" Network Programming using Java, Go, Python, Rust, JavaScript and Julia" by Jan Newmarch is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License .
Based on a work at https://jan.newmarch.name/NetworkProgramming/ .

If you like this book, please contribute using PayPal