Www For Developers, simple as that!

May 15, 2023

“Mom, I can’t sleep.”

- “Well, maybe you should try reading a book instead of being on the internet all day.”

You’ve probably witnessed a similar scene at home or elsewhere, all thanks to the darn internet. But hold on, is “darn” really the right word to describe it?

Hmm… let’s see, the internet has become a crucial element in people’s daily lives, and that’s indisputable because we’re aware of the value it provides, from instant communication to serving as a second brain for many, no one needs to memorize anything anymore because “it’s all on the internet” and available at all times.

But that’s what everyone already knows, everyone knows how to use the internet, but as a web developer, that’s not enough for you. You need to understand how everything works behind the scenes in order to create scalable, reliable, and high-performance software. Everyone knows how to eat cake, but making it? That’s your job to know.

Internet vs WWW

First and foremost, let’s clarify a point as simple as it is important: Is the internet the same thing as the web?

NO! The internet is not the web, it’s not just that. On the contrary, the web (world wide web) is just a piece of the cake that is the internet, but it’s the piece everyone wants, the tastiest piece, and that’s why we usually confuse the two elements.

The internet is simply the largest network of connected computers around the globe, and the web is an information system combined with a set of technologies that aims to share resources and uses the internet infrastructure for transmission.

So, we can move on to what the web architecture is. How does the web work?

The Web

Designed to be simple, reliable, and scalable, the web’s architecture is summarized in two main elements:

Client

A client is any element that requests a resource from the server.
Server

As the holder of the system’s resources, it’s the element responsible for handling client requests.

So, as we can see, we have a very simple scheme here. Okay, and how are data sent back and forth? How do we know what the client requested? Which client requested what? And if it really received a response? This is where the internet comes in!

Web meets The Internet

The internet is capable of connecting devices at any distance and at a fair price (protocols), so the web looks at this and says: “why not use this?“.

With the help of protocols, the internet ensures that computers communicate, and its architecture is based on the TCP/IP network model, ensuring reliability and scalability because unlike the UDP (Unified Datagram Protocol), TCP (Transfer Control Protocol) cares about establishing a connection between the sender and the receiver before sending the information and thus ensures the delivery of data.

We can think of protocols with the following analogy:

Imagine a room with students from different parts of the world who have different native languages. Would it be possible for them to communicate? Well, in the best of chances where each student, in addition to their own language knows the rest of the grid of languages spoken in the room, yes, it would be possible.

But how costly would that be? And what if a new student appeared who only knows their native language? That wouldn’t work. So now imagine the school puts a criterion that says only English is spoken within the school, which is a language that everyone knows? Much better, right? This would ensure that everyone understood each other, in a fluent communication.

A protocol is exactly that, a rule established for the success of a particular interaction, which in this case is communication.

Therefore, the final scheme of the web would look like this:

Client → Internet → Server

Having this scheme, we can move on to how these resources actually travel back and forth. Although the web uses the TCP/IP structure of the internet, it has its own protocol for carrying information: HTTP.

HTTP

The Hypertext Transfer Protocol is the primary protocol used for resource transfer between clients and servers. It is encapsulated by TCP/IP to carry the information.

The communication flow between the client and the server becomes as follows:

Client establishes a connection with the server via TCP/IP
Client sends an HTTP request to the server
Server sends the HTTP request response to the client
Server terminates the connection

Important points:

The connection between the client and the server is stateless in HTTP, meaning that each request is independent and no history is recorded. This allows the server to handle multiple requests simultaneously without worrying about the client, a factor that enhances scalability.
The connection can be closed on the client or server side.
1. Closed on the server side: It can be due to successful completion of the interaction, or internal error, such as a software error or overload, thus issuing an error code to the client.
2. Closed on the client side: It can be for several reasons such as network error or the receipt of a complete response, in which case no error is issued.
3. The process of request (from the client to the server) and response (from the server to the client) is called a transaction. A transaction is made using data blocks, called HTTP Messages, that is, requests and responses are HTTP messages.

Okay, but how does this work in practice? How is an HTTP request made?

HTTP Messages

An HTTP request consists of a data block with 3 parts:

Start Line
Headers
Body

Start Line

Area where it is specified what is intended to be done, what is the objective of this request, accompanied by the path to the resource (remember that the objective of the web is sharing resources?), and the protocol version.

GET /api/users/123 HTTP/1.1

Headers

Area where all important information for handling the resource itself is specified, such as the type of resource (e.g. text, video), language preference, host information, and so on.

Host: example.com
Accept: application/json

Body

This is where the real resource is located. Note that this is an optional field because not every request needs a body.

N/A //It doesn't need a body in this case

So in the end, an HTTP request is this:

GET /api/users/123 HTTP/1.1
Host: example.com
Accept: application/json

After receiving the request, the server processes the request and sends a response to the client in a data block with the same structure, but containing the response information.

Start Line
Contains the protocol version, followed by the response code and a short descriptive message.

HTTP/1.1 200 OK

Headers
Contains information related to the response, such as resource type and size.

Content-Type: application/json
Content-Length: 56

Body
As with the request, this is where the requested resource is located.

{"id": "123", "name": "John Doe", "email": "johndoe@example.com"}

Probably some questions arise from this, such as:

- How do I define the purpose of the request?

- What are response codes?

- How do I define the resource type?

HTTP Methods

The purpose of the request is defined by using HTTP methods. Every HTTP request must contain a method because it tells the server what action to take (e.g. get resource, delete, run an application, etc.).

Here are the most famous HTTP methods:

GET - used to retrieve data from a specified resource. It is a safe and idempotent method, meaning it should not have any side effects on the server and can be called multiple times without changing the result.

POST - used to submit data to be processed to a specified resource. It is commonly used to create new resources on the server.

PUT - used to update or replace an existing resource with new data. It replaces the entire resource with the new representation provided in the request.

DELETE - used to delete a specified resource. It removes the specified resource from the server.

HEAD - similar to the GET method, but it only retrieves the headers of a response without the actual body content. It is often used to check the status or metadata of a resource without transferring the entire response.

HTTP Response Codes

As for response codes, they are 3-digit numbers used by the server to inform the client of the status of the request. It is important to note that it is by the code that the client interprets the status of the response, not the descriptive message. Therefore:

200 ok = 200 let’s go!

Here are some famous response codes:

302 - this is a redirection response. It indicates that the requested resource has been temporarily moved to a different location. The client should typically follow the redirection and make a new request to the provided location.

404 - indicates that the requested resource could not be found on the server. It means that the server was unable to locate the requested resource, often because it does not exist or the URL is incorrect.

500 - represents an internal server error. It indicates that an unexpected error occurred on the server, and the request could not be completed. This status code is typically used for server-side errors that are not caused by the client.

401 - indicates that the client request requires authentication. It means that the client must provide valid credentials (such as a username and password) to access the requested resource.

503 - indicates that the server is temporarily unavailable or overloaded. It is often used for server maintenance or when the server cannot handle the current request load.

MIME Types

And now, what about the resource type definition? Resources are stored on the server, and it can be difficult for the server to understand which resource the client wants if the type is not specified.

What would happen if I had the files party.mp3 and party.png on the server, but the request only said that I want the file ‘party’? How would the server handle this?

Thinking about situations like this, the HTTP protocol uses a tagging system called MIME type.

Content-Type: application/json //This is the MIME Type for JSON responses
Content-Length: 56

Multipurpose Internet Mail Extensions (MIME) was adopted by HTTP after serving as the ideal solution to the problem of exchanging messages between different email systems. Therefore, servers place a MIME tag on every resource before sending it to the client.

Here are a few more examples of MIME Types:

Content-Type: text/html //This is the MIME Type for HTML documents

Content-Type: application/pdf //This is the MIME Type for PDF documents

Content-Type: image/jpeg //This is the MIME Type for JPEG images

Content-Type: audio/mpeg //This is the MIME Type for MPEG audio files

Content-Type: video/mp4 //This is the MIME Type for MP4 video files

Content-Type: application/xml //This is the MIME Type for XML documents

URI: URL vs URN

Last but not least, it’s important to know how resources are addressed on the server, and for that, the Web uses URI technology.

Uniform Resource Identifier, or simply URI, as the name suggests, is a way to uniquely identify a resource so that the client can know how to reach it, much like the address of someone’s house, from which you know how to find them.

For example,

http://home.com/bedroom/table/paper.txt

shows how the URI would look to access the resource paper.txt.

But isn’t that a URL? An interesting question…

There are two types of URI, namely the well-known Uniform Resource Locator (URL) and the not so well-known Uniform Resource Name (URN).

http://home.com/bedroom/table/paper.txt

is indeed a URL. URLs are made up of three parts:

Schema: http:// (Defines the protocol to be used to access the resource)
Domain/IP: home.com (Defines where to look for the resource)
The rest of the location: /bedroom/table/paper.txt (Defines the exact path to reach the resource)

On the other hand, we have URN, with a very different approach. This technology relies on unique names for resources on the web, independent of their location, and allows these resources to be passed from server to server while maintaining the same name.

For example,

urn:example:home.com:bedroom:table:paper.txt

URNs ended up being less used than URLs due to the lack of standardization and a structure to support them. However, they are still used for specific situations, such as the International Standard Book Number (ISBN) or the Digital Object Identifier (DOI), a type of URN commonly used to identify scientific articles.

Therefore, most browsers were designed to work primarily with URLs.

Conclusion

Keeping all of this in mind, it’s easy to see how simple the architecture of the Web is and has always been intended to be, just a conversation happening behind the scenes at the end of the day, nothing scary.

Knowing how it works is the first step for you as a web developer to understand how to design and develop reliable, functional, and scalable systems on the web.

Thank you very much for reading the article, I hope it has added something to you.

Feel free to contact énio, whether to discuss the article or just to say hello 👋🏿!