HTTP is literally everywhere. While reading this post from Medium, some HTTP/HTTPS flow or network communication runs in the background. Every website that you visit uses HTTP protocol to communicate with an HTTP server. Wait, what is a server? What does that even mean? In this article, I will explain the context of HTTP servers and how you can create your own one.
My next step was deciding to look & learn the HTTP protocol (after researching the Internet and web). You may ask why I thought I needed to learn the underlying infrastructure, even when creating a functional website? Yes, that may sound unintuitive; however, I believe that a programmer should learn the ins and outs and better understand the underlying structure one uses daily. When you know and understand what is going on and how different components interact, you can quickly identify and solve problems. I searched up on the Internet with that motivation, looking for intuitive descriptions of client-server models, HTTP communication, etc. After gathering useful resources, I decided to re-build the system (even a part of it) to get a deeper insight. Now, enough about the introduction and my journey, let's dive into implementing one.
Where to start
First, we need to look at OSI, the Open Systems Interconnection model. OSI is a model that characterizes and standardizes the communication functions of a telecommunication system without regard to its underlying internal structure and technology. OSI consists of multiple layers, particularly:
- Physical Layer
- Data Link Layer
- Network Layer
- Transport Layer
- Session Layer
- Presentation Layer
- Application Layer
The layer we are interested in is the 4th layer, Transport Layer. The transport layer provides transparent transfer of data between end-users, providing reliable data transfer services to the upper layers. The transport layer also provides the acknowledgment of the successful data transmission and sends the next data if no errors occurred. Typical examples of layer 4 are the Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). When building HTTP servers, we’ll use TCP to implement the transport layer because TCP provides a reliable link between two computers (if a packet gets lost — it is re-transmitted). UDP is not reliable because UDP packets may get lost (a packet is a formatted data unit).
RFC-2616 also states that HTTP communication usually takes place over TCP/IP connections:
HTTP communication usually takes place over TCP/IP connections. The
default port is TCP 80 , but other ports can be used. This does
not preclude HTTP from being implemented on top of any other protocol
on the Internet, or on other networks. HTTP only presumes a reliable
transport; any protocol that provides such guarantees can be used;
the mapping of the HTTP/1.1 request and response structures onto the
transport data units of the protocol in question is outside the scope
of this specification.
RFC is a memorandum published by the Internet Engineering Task Force (IETF) describing methods, behaviors, research, or innovations applicable to the working of the Internet, along with Internet-connected systems. In short, it is a document where engineers document current methods. Here are some RFCs regarding HTTP protocol:
Now, about servers. A server is a piece of computer hardware or software that provides functionality for clients. In the context of HTTP and web, a server is a software that can satisfy client HTTP request on WWW and can return an HTTP response based on the request. Here’s a diagram representing the client-server model:
Hypertext Transfer Protocol, or simply HTTP, functions as a request-response protocol in the client-server model. For example, a web browser or
curl may be the client and
NginX or an application running in the computer hosting a website may be the server. The client sends an HTTP request message to the server. The server receives the request and provides resources such as HTML files, and then sends a response back to the client.
Anatomy of HTTP Requests
An HTTP request is a message in the following format:
- Request line
- Zero or more headers + CRLF
- CRLF (Empty line)
- Optional message body
Here, CRLF is
\r\n . A request-line begins with a method, followed by the request URI, HTTP version, and CRLF.
GET /hello HTTP/1.1\r\n
GET is the HTTP method,
/hello is the request URI, and
HTTP/1.1 is the HTTP version.
A complete list of request methods can be found here, but we will only deal with a GET request. A GET request requests for a source, which the server returns a
HTML file as a response
HTTP requests then include zero or more headers that stores information about the client. Here is the complete list of header values. We are not going to deal with headers now.
Then follows a message body (optional for GET, required for other methods). Since we are going to implement only GET requests, we will not deal with the request body.
An example request:
Here, the request method is GET, request URI is “/,” which means index, and the HTTP version is HTTP/1.1.
Anatomy of HTTP Responses
After receiving an HTTP request, the server prepares and sends an HTTP response in the following format:
- Status line
- Zero or more headers + CRLF
- Message body
A response begins with a status line. The status line consists of an HTTP version, status code, and status message.
HTTP/1.1 200 OK
HTTP/1.1 is the HTTP version, 200 is the status code, and
OK is the status message. A complete list of status codes and messages can be found here.
Again, we don’t need to deal with headers; however, there are a few of them that we should implement in responses. They are
When a client wants to communicate with the server, it performs the following steps:
- Opening a TCP connection: the connection is used to send/receive requests/responses from the server. We will see how to open a TCP connection in part II.
- Send an HTTP request: The client sends an HTTP request through the TCP connection. The request format is discussed above.
- Receive an HTTP response: The server then processes the HTTP request that the client sent, then prepares a new response. Again, the response format is discussed above.
- Close the TCP connection
Note: If the client sends a request containing the
Connection: Keep-Alive header, we don’t need to close and reopen a connection for every client. Instead, we can reuse the same TCP connection for every request of a particular client.
It is the client’s duty to initiate the connection with a server, and the way to initiate a TCP connection is through entering an URL in a browser. Here, the browser is assumed as the client. For example, when we type
www.medium.com in the browser, the browser fetches
index.html or any other HTML file from the webserver.
For example, if you enter the URL
app.example.com/about , the browser will add
http in front of this URL because the connection is over HTTP protocol. The new URL will be
http://is the protocol name
example.comis the domain of the server
appis the subdomain of the server
/aboutis the path that the browser (client) requests for
The browser constructs an HTTP request, and the request may be:
GET /about HTTP/1.1
It is the server’s duty to look for the path
/about in the local system path and get the correct HTML file and return it to the client. We’ll talk more about this in later parts.
The response may look like this:
HTTP/1.1 200 OK
Content-Length: 1687<!DOCTYPE html>
That’s it! This is all we need to know to implement our own HTTP server from scratch!
If you have any questions, please comment below.