The Complete Guide to System Scalability: Scale Up, Scale Out, or Go Diagonal?


Scalability describes a system’s elasticity; in simple terms, it is the system’s ability to grow. In order to scale or grow a system can scale up and scale out. In order to downsize, which can happen if the load has reduced - i.e. if a business is cyclical and there are periods of heavy traffic and light traffic, a system can scale down or scale in.
Scalability is relevant because it is almost impossible to predict a system's growth rate, particularly one that requires engagement from end users. While it is possible to go ‘Big’ from the beginning and build a super powerful system that is able to handle any scenario you can think of, this approach is seldom considered due to the significant costs involved. Therefore, the more common approach is to follow a path where you build a system that is within your budget and is fit for purpose for the amount of traffic or load you’ll need at that time, and then scale up, out, down or in based on the changes in demand.
Horizontal Scaling (Scaling out) is the process of adding additional nodes or machines to an infrastructure, increasing its capacity to handle rising demand. However, this process is complex as you have to decide which node does what and how the new node works with the other nodes in the infrastructure. In order to visualise how this works, you can think about a scenario when workload for a single employee is too much, so you hire more employees and then delegate the work amongst the group.

Vertical Scaling (Scaling up) is the process of increasing the resources of a system so that it is able to meet the demand of the work that it needs to perform. If the processor in a server is being overwhelmed by the volume of work that is being sent to it, upgrading to a more powerful processor which completes the same volume of work more quickly is logical; and doing so would be an example of vertical scaling. This applies to other components in the server as well, i.e. RAM, storage, or networking capacity. Replacing an existing server with a more powerful one is also a form of vertical scaling.

Here’s a quick breakdown of the differences between horizontal scaling and vertical scaling.
Feature | Horizontal Scaling (Scale-Out) | Vertical Scaling (Scale-Up) |
|---|---|---|
Description | Adds more server instances (nodes) to the system. | Adds resources (CPU, RAM, Storage) to a single existing node. |
Limit | Theoretically unlimited; you can keep adding servers indefinitely. | Limited by the maximum hardware capacity of a single machine. |
Downtime | Minimal to None. You can add servers while the system is running. | Required. Usually requires a restart or maintenance window to upgrade hardware. |
Complexity | High. Requires load balancing, distributed computing logic, and complex data consistency management. | Low. The architecture remains the same; the software doesn't necessarily need to change. |
Cost | Initially expensive but Cost-efficient long term. Uses cheaper, commodity hardware. Costs scale linearly. | Low-cost initially but Expensive long term. |
Resilience | High Availability. If one server fails, the load balancer redirects traffic to others. | Single Point of Failure. If the massive server goes down, the entire application goes down. |
Best For | Distributed systems, microservices, web applications with massive traffic. | Monolithic applications, small-to-medium datasets, early-stage startups. |
Diagonal scaling is a hybrid approach that combines the best of both worlds. In this approach, you start by scaling vertically (adding more RAM, better CPU, etc.) until you get to a certain performance or cost threshold. At that point, you shift to a horizontal approach by cloning the optimised system and adding more of then to the infrastructure.
This approach gives a team the flexibility of scaling vertically initially to optimise costs and resources, then scale out horizontally to a distributed system as the demand grows. This approach limits the downsides of pure vertical scaling such as hardware limits, single point of failure, etc. while delaying the complexity and higher costs of horizontal scaling until necessary.

The distinction between whether an application is stateful or stateless is crucial in deciding how and if you can scale. It dictates whether you can add more servers in a horizontal setup or if you’re forced to buy a bigger server in a vertical setup.
Stateless applications do not store session or user data on the server; each request contains all the information needed to be processed independently (e.g., WebAPIs and microservices). This setup makes it easy to scale as there is no dependency on prior interactions, allowing any node to process any request which is perfect for horizontal scaling.
Stateful applications retain session information between requests as they require context to function correctly (e.g., databases and auth servers). This makes it challenging to scale horizontally due to data consistency requirements; when a user is switched to a new server, that server has no idea who you are or what you were doing.
As we’ve covered in this post, horizontal scaling relies on the ability to distribute traffic randomly to a fleet of servers.
Stateless Applications make horizontal scaling smooth. The load balancer sends each user request randomly to a server in the fleet, which consumes the request and performs the action. If one server crashes, the load balancer sends subsequent requests to any working node. The user experiences zero interruptions.
Stateful Applications pose a “sticky session” problem, if a User logs into server 1, the load balancer must send that user to server 1 until the session has ends. This makes scaling hard as you cannot easily balance the load. If server 1 is overloaded but server 2 is empty, you cannot just move a user to server 2 or they’ll be logged out. Furthermore, if Server 1 crashes, the data of the users on that server is gone so they’ll be logged out and their work will be lost.
Stateless applications can scale vertically, but it’s usually a waste of money. You don’t need a super powerful computer to handle simple, independent requests.
Stateful applications are great for vertical scaling. Since the application struggles to run across multiple machines, the easiest solution is to keep it on one machine and upgrade the machine to a more powerful configuration.
A common approach to scaling stateful apps horizontally is to rearchitect the app to utilise an external state. Instead of storing the session in the server’s RAM, you store the session data in a shared, fast database like Redis.
The workflow for this would be:
The result of this is that the application servers become stateless and scalable while the state is managed by a specialised database.
*** END OF FILE ***