The Web is built on top of the HTTP (Hyper-Text Transport Protocol) which is layered on top of a transport protocol such as TCP. HTTP has been through five publically available versions. Version 2 is now used by over 40% of websites ( Usage statistics of HTTP/2 for websites ). HttP3 is currently used by only 6% of websites ( Usage statistics of HTTP/3 for websites ), but there is support by sigifcant vendors such as Google and Cloudflare, and browser support from Google Chrome and Mozilla Firefox ( HTTP/3: the past, the present, and the future ). The Web Almanac gives similar figures for 2019 Part IV Chapter 20: HTTP/2
In this chapter we give an overview of HTTP, and later chapters look at languages bindings for clients and servers.
URLs specify the location of a resource. A resource is often a static file, such as an HTML document, an image, or a sound file. But increasingly, it may be a dynamically generated object, perhaps based on information stored in a database.
When a user agent requests a resource, what is returned is not the resource itself, but some representation of that resource. For example, if the resource is a static file, then what is sent to the user agent is a copy of the file.
Multiple URLs may point to the same resource, and an HTTP server will return appropriate representations of the resource for each URL. For example, an company might make product information available both internally and externally using different URLs for the same product. The internal representation of the product might include information such as internal contact officers for the product, while the external representation might include the location of stores selling the product.
This view of resources means that the HTTP protocol can be fairly simple and straightforward, while an HTTP server can be arbitrarily complex. HTTP has to deliver requests from user agents to servers and return a byte stream, while a server might have to do any amount of processing of the request.
HTTP is a stateless, connectionless, reliable protocol. In the simplest form, each request from a user agent is handled reliably and then the connection is broken.
In the earliest version of HTTP, each request involved a separate TCP connection, so if many resources were required (such as images embedded in an HTML page) then many TCP connections had to be set up and torn down in a short space of time.
HTTP/1.1 added many optimisations in HTTP which added complexity to the simple structure, but created a more efficient and reliable protocol. HTTP/2 has adopted a binary form for further efficienct gains.
There are 5 versions of HTTP
Request = Simple-Request Simple-Request = "GET" SP Request-URI CRLF
A response is of the form
Response = Simple-Response Simple-Response = [Entity-Body]
This version added much more information to the requests and responses. Rather than "grow" the 0.9 format, it was just left alongside the new version.
The format of requests from client to server is
Request = Simple-Request | Full-Request Simple-Request = "GET" SP Request-URI CRLF Full-Request = Request-Line *(General-Header | Request-Header | Entity-Header) CRLF [Entity-Body]A Simple-Request is an HTTP/0.9 request and must be replied to by a Simple-Response.
A Request-Line has format
Request-Line = Method SP Request-URI SP HTTP-Version CRLFwhere
Method = "GET" | "HEAD" | POST | extension-methode.g.
GET http://jan.newmarch.name/index.html HTTP/1.0
A response is of the form
Response = Simple-Response | Full-Response Simple-Response = [Entity-Body] Full-Response = Status-Line *(General-Header | Response-Header | Entity-Header) CRLF [Entity-Body]
The Status-Line gives information about the fate of the request:
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLFe.g.
HTTP/1.0 200 OKThe codes are
Status-Code = "200" ; OK | "201" ; Created | "202" ; Accepted | "204" ; No Content | "301" ; Moved permanently | "302" ; Moved temporarily | "304" ; Not modified | "400" ; Bad request | "401" ; Unauthorised | "403" ; Forbidden | "404" ; Not found | "500" ; Internal server error | "501" ; Not implemented | "502" ; Bad gateway | "503" | Service unavailable | extension-code
The Entity-Header contains useful information about the Entity-Body to follow
Entity-Header = Allow | Content-Encoding | Content-Length | Content-Type | Expires | Last-Modified | extension-headerFor example
HTTP/1.1 200 OK Date: Fri, 29 Aug 2003 00:59:56 GMT Server: Apache/2.0.40 (Unix) Accept-Ranges: bytes Content-Length: 1595 Connection: close Content-Type: text/html; charset=ISO-8859-1
HTTP/1.1 fixes many problems with HTTP/1.0, but is more complex because of it. This version is done by extending or refining the options available to HTTP/1.0. e.g.
GET http://www.w3.org/index.html HTTP/1.1
Otherwise an absolute path shold be used, and include
a Host
header field as in
GET /index.html HTTP/1.1
Host: www.w3.org
All the earlier versions of HTTP are text-based. The most significant departure for HTTP/2 is that it is a binary format. In order to ensure backwards compatability this can't be managed by sending a binary message to an older server to see what it does. Instead an HTTP/1.1 message is sent with extra attributes, essentially asking if the server wants to switch to HTTP/2. If it doesn't understand the extra fields it replies with a normal HTTP/1.1 response and the session continues with HTTP/1.1.
Otherwise the server can respond that it is willing to change, and the session can continue with HTTP/2.
HTTP/2 uses a binary format and also can carry multiple streams within a single TCP connection. While usually speeding up the web, the use of a single TCP stream has one significant problem called the 'head of line' problem: if one packet is held up, lost or whatever, all streams come to a halt. This has been solved by protocols such as SCTP but they haven't gained wide acceptance on the internet: with so many uncontrolled hosts, both servers and clients, there is no way a new level 4 protocol is going to be widely accepted.
So basically, we are stuck with two 50 year old protocols: TCP and UDP. QUIC is a user-space protocol built on top of UDP. HTTP/3 uses QUIC as its transport protocol. Since it is in user space, it is easy to embed it in applications and libraries, without having to add it to kernels everywhere.
QUIC 'connections,' using UDP, are not compatable with TCP. If a QUIC connection fails, it needs to drop back to using TCP. So HTTP/2 will still be needed.
The 0.9 protocol took one page. The 1.0 protocol was described in about 20 pages. 1.1 takes 120 pages, while HTTP/2 takes about 96 pages and HTTP/3 takes a further 60 pages.
Copyright © Jan Newmarch, jan@newmarch.name
" Network Programming using Java, Go, Python, Rust, JavaScript and Julia"
by
Jan Newmarch
is licensed under a
Creative Commons Attribution-ShareAlike 4.0 International License
.
Based on a work at
https://jan.newmarch.name/NetworkProgramming/
.