Creating your own HTTP Server — Part I

Photo by Jordan Harrison on Unsplash

TTP is literally everywhere. While reading this post from Medium, some HTTP/HTTPS flow or network communication runs in the background. Every website that you visit uses HTTP protocol to communicate with an HTTP server. Wait, what is a server? What does that even mean? In this article, I will explain the context of HTTP servers and how you can create your own one.

Prelude

In today’s world, what I see common between programmers (and beginner programmers) is they tend to start with web development, particularly HTML/CSS and Javascript. After learning the basics, they start to use fancy frameworks or libraries, e.g., Express.js for Javascript, Django & Flask for Python, etc. I was once also a web developer, Python + Django stack, and with enough study and practice, I get to this point where I could create my own fully-function web apps with web development libraries. At that point, I became familiar with creating a model, updating the database or writing views that interact with models, etc. However, the thing that disturbed me was I had no idea about the underlying system I use daily. What I really mean is that I could create a modern PWA, but I had no single idea about how my code actually works!

My next step was deciding to look & learn the HTTP protocol (after researching the Internet and web). You may ask why I thought I needed to learn the underlying infrastructure, even when creating a functional website? Yes, that may sound unintuitive; however, I believe that a programmer should learn the ins and outs and better understand the underlying structure one uses daily. When you know and understand what is going on and how different components interact, you can quickly identify and solve problems. I searched up on the Internet with that motivation, looking for intuitive descriptions of client-server models, HTTP communication, etc. After gathering useful resources, I decided to re-build the system (even a part of it) to get a deeper insight. Now, enough about the introduction and my journey, let's dive into implementing one.

Photo by Hal Gatewood on Unsplash

Where to start

First, we need to look at OSI, the Open Systems Interconnection model. OSI is a model that characterizes and standardizes the communication functions of a telecommunication system without regard to its underlying internal structure and technology. OSI consists of multiple layers, particularly:

  1. Physical Layer
  2. Data Link Layer
  3. Network Layer
  4. Transport Layer
  5. Session Layer
  6. Presentation Layer
  7. Application Layer

The layer we are interested in is the 4th layer, Transport Layer. The transport layer provides transparent transfer of data between end-users, providing reliable data transfer services to the upper layers. The transport layer also provides the acknowledgment of the successful data transmission and sends the next data if no errors occurred. Typical examples of layer 4 are the Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). When building HTTP servers, we’ll use TCP to implement the transport layer because TCP provides a reliable link between two computers (if a packet gets lost — it is re-transmitted). UDP is not reliable because UDP packets may get lost (a packet is a formatted data unit).

RFC-2616 also states that HTTP communication usually takes place over TCP/IP connections:

HTTP communication usually takes place over TCP/IP connections. The
default port is TCP 80 [19], but other ports can be used. This does
not preclude HTTP from being implemented on top of any other protocol
on the Internet, or on other networks. HTTP only presumes a reliable
transport; any protocol that provides such guarantees can be used;
the mapping of the HTTP/1.1 request and response structures onto the
transport data units of the protocol in question is outside the scope
of this specification.

RFC is a memorandum published by the Internet Engineering Task Force (IETF) describing methods, behaviors, research, or innovations applicable to the working of the Internet, along with Internet-connected systems. In short, it is a document where engineers document current methods. Here are some RFCs regarding HTTP protocol:

Now, about servers. A server is a piece of computer hardware or software that provides functionality for clients. In the context of HTTP and web, a server is a software that can satisfy client HTTP request on WWW and can return an HTTP response based on the request. Here’s a diagram representing the client-server model:

Client-Server model

HTTP Protocol

Hypertext Transfer Protocol, or simply HTTP, functions as a request-response protocol in the client-server model. For example, a web browser or curl may be the client and NginX or an application running in the computer hosting a website may be the server. The client sends an HTTP request message to the server. The server receives the request and provides resources such as HTML files, and then sends a response back to the client.

Anatomy of HTTP Requests

An HTTP request is a message in the following format:

  1. Request line
  2. Zero or more headers + CRLF
  3. CRLF (Empty line)
  4. Optional message body

Here, CRLF is \r\n . A request-line begins with a method, followed by the request URI, HTTP version, and CRLF.

GET /hello HTTP/1.1\r\n

Here, GET is the HTTP method, /hello is the request URI, and HTTP/1.1 is the HTTP version.

A complete list of request methods can be found here, but we will only deal with a GET request. A GET request requests for a source, which the server returns a HTML file as a response

HTTP requests then include zero or more headers that stores information about the client. Here is the complete list of header values. We are not going to deal with headers now.

Then follows a message body (optional for GET, required for other methods). Since we are going to implement only GET requests, we will not deal with the request body.

An example request:

Sample request

Here, the request method is GET, request URI is “/,” which means index, and the HTTP version is HTTP/1.1.

Anatomy of HTTP Responses

After receiving an HTTP request, the server prepares and sends an HTTP response in the following format:

  • Status line
  • Zero or more headers + CRLF
  • CRLF
  • Message body

A response begins with a status line. The status line consists of an HTTP version, status code, and status message.

HTTP/1.1 200 OK

Here, HTTP/1.1 is the HTTP version, 200 is the status code, and OK is the status message. A complete list of status codes and messages can be found here.

Again, we don’t need to deal with headers; however, there are a few of them that we should implement in responses. They are Content-Length , Content-Type , Server .

Sample HTTP Response

HTTP Flow

When a client wants to communicate with the server, it performs the following steps:

  1. Opening a TCP connection: the connection is used to send/receive requests/responses from the server. We will see how to open a TCP connection in part II.
  2. Send an HTTP request: The client sends an HTTP request through the TCP connection. The request format is discussed above.
  3. Receive an HTTP response: The server then processes the HTTP request that the client sent, then prepares a new response. Again, the response format is discussed above.
  4. Close the TCP connection

Note: If the client sends a request containing the Connection: Keep-Alive header, we don’t need to close and reopen a connection for every client. Instead, we can reuse the same TCP connection for every request of a particular client.

It is the client’s duty to initiate the connection with a server, and the way to initiate a TCP connection is through entering an URL in a browser. Here, the browser is assumed as the client. For example, when we type www.medium.com in the browser, the browser fetches index.html or any other HTML file from the webserver.

For example, if you enter the URL app.example.com/about , the browser will add http in front of this URL because the connection is over HTTP protocol. The new URL will be http://app.example.com/about :

  • http:// is the protocol name
  • example.com is the domain of the server
  • app is the subdomain of the server
  • /about is the path that the browser (client) requests for

The browser constructs an HTTP request, and the request may be:

  • GET /about HTTP/1.1

It is the server’s duty to look for the path /about in the local system path and get the correct HTML file and return it to the client. We’ll talk more about this in later parts.

The response may look like this:

HTTP/1.1 200 OK
Server: yourName
Date: ...
Content-Type: text/html
Content-Length: 1687
<!DOCTYPE html>
...
</html>

That’s it! This is all we need to know to implement our own HTTP server from scratch!

If you have any questions, please comment below.

Resources: