The Complete Guide to System Scalability | Ridoy Farhad

What is Scalability?

Scalability describes a system’s elasticity; in simple terms, it is the system’s ability to grow. In order to scale or grow a system can scale up and scale out. In order to downsize, which can happen if the load has reduced - i.e. if a business is cyclical and there are periods of heavy traffic and light traffic, a system can scale down or scale in.

Scalability is relevant because it is almost impossible to predict a system's growth rate, particularly one that requires engagement from end users. While it is possible to go ‘Big’ from the beginning and build a super powerful system that is able to handle any scenario you can think of, this approach is seldom considered due to the significant costs involved. Therefore, the more common approach is to follow a path where you build a system that is within your budget and is fit for purpose for the amount of traffic or load you’ll need at that time, and then scale up, out, down or in based on the changes in demand.

What is Horizontal Scaling?

Horizontal Scaling (Scaling out) is the process of adding additional nodes or machines to an infrastructure, increasing its capacity to handle rising demand. However, this process is complex as you have to decide which node does what and how the new node works with the other nodes in the infrastructure. In order to visualise how this works, you can think about a scenario when workload for a single employee is too much, so you hire more employees and then delegate the work amongst the group.

What is Vertical Scaling?

Vertical Scaling (Scaling up) is the process of increasing the resources of a system so that it is able to meet the demand of the work that it needs to perform. If the processor in a server is being overwhelmed by the volume of work that is being sent to it, upgrading to a more powerful processor which completes the same volume of work more quickly is logical; and doing so would be an example of vertical scaling. This applies to other components in the server as well, i.e. RAM, storage, or networking capacity. Replacing an existing server with a more powerful one is also a form of vertical scaling.

Horizontal Vs. Vertical Scaling

Here’s a quick breakdown of the differences between horizontal scaling and vertical scaling.

Feature	Horizontal Scaling (Scale-Out)	Vertical Scaling (Scale-Up)
Description	Adds more server instances (nodes) to the system.	Adds resources (CPU, RAM, Storage) to a single existing node.
Limit	Theoretically unlimited; you can keep adding servers indefinitely.	Limited by the maximum hardware capacity of a single machine.
Downtime	Minimal to None. You can add servers while the system is running.	Required. Usually requires a restart or maintenance window to upgrade hardware.
Complexity	High. Requires load balancing, distributed computing logic, and complex data consistency management.	Low. The architecture remains the same; the software doesn't necessarily need to change.
Cost	Initially expensive but Cost-efficient long term. Uses cheaper, commodity hardware. Costs scale linearly.	Low-cost initially but Expensive long term.
Resilience	High Availability. If one server fails, the load balancer redirects traffic to others.	Single Point of Failure. If the massive server goes down, the entire application goes down.
Best For	Distributed systems, microservices, web applications with massive traffic.	Monolithic applications, small-to-medium datasets, early-stage startups.

Advantages of Horizontal Scaling

Simpler to upgrade hardware: You simply add new machines (nodes); you don’t need to worry about upgrading specific system specs.
High Availability: By distributing workload to multiple nodes, you mitigate the risk of total failure. If one node is unavailable, the workload is transferred to an available node.
Less downtime: To scale horizontally, you add new machines (nodes), during the process, the old machines stay up and running, thus, downtime can be avoided.
Improved performance: Distributed architecture offers better performance for huge traffic use cases compared to monolithic architecture.
Theoretical infinite scale: There is practically no limit to how much you can scale, as you can just keep adding new nodes to the infrastructure indefinitely. In comparison, vertical scaling hits a limit where you just can’t get a better CPU or add more RAM.

Disadvantages of Horizontal Scaling

Increased complexity: When you scale out, you effectively move from managing one machine to managing multiple. The application must be designed to run across multiple machines simultaneously, often requiring the code to be “stateless”. You also need to add additional tools such as load balancing, virtualisation and backups.
Data consistency issues: In a horizontal setup, data consistency is hard. For example, if User A updates a value on Server 1, and User B views that data on Server 2, User B might see old data if the servers haven’t synced yet.
Higher initial costs: Adding new servers is more expensive than upgrading existing ones. You are adding a whole new machine to the infrastructure, vertical scaling might only require upgrading a single component such as the CPU.
Network latency: In a single server setup, communication is fast. In a distributed setup, a network requests hops between the the load balancer, application layer, database layer. Each hop adds latency.
Debugging is harder: Finding a bug is harder when you don’t know which server processed the failed operation.

Advantages of Vertical Scaling

Cost effective: It is initially cheaper to upgrade a single component than to buy an additional machine
Less complex: One node handles everything, so no syncing or load balancers needed. Everything lives in one machine, so you don’t have to worry about users seeing stale data because one server hasn’t synced with another yet.
Easier to maintain: A single machine is easier to maintain than a fleet of machines.
No need for software changes: You don’t need to modify how your application works by rewriting code to be “stateless” or “distributed”; you just make the computer it runs on faster.
Lower latency: Communication within a computer is faster than communication between computers.

Disadvantages of Vertical Scaling

Increased possibility of downtime: You typically cannot upgrade while the machine is running. For example, in order to upgrade the CPU, you need to shutdown the machine to do it, thus, creating downtime for the application, which may be unacceptable for modern global 24/7 applications.
Single point of failure: As you’re running everything on one machine, any major failure (failing motherboard, power supply issues, OS corruption), takes your entire application down. There’s no backup server available to take over immediately.
Upgrade limitations: There is a finite limit to how much power you can add to a single machine i.e. you can’t add any more RAM or CPU beyond a certain point. Once you hit this limit, you have nowhere to go.
Exponential costs: The cost curve for vertical scaling is not linear. A server with 64GB RAM might cost X but a server with 128GB RAM is usually more than 2X.

What is Diagonal Scaling?

Diagonal scaling is a hybrid approach that combines the best of both worlds. In this approach, you start by scaling vertically (adding more RAM, better CPU, etc.) until you get to a certain performance or cost threshold. At that point, you shift to a horizontal approach by cloning the optimised system and adding more of then to the infrastructure.

This approach gives a team the flexibility of scaling vertically initially to optimise costs and resources, then scale out horizontally to a distributed system as the demand grows. This approach limits the downsides of pure vertical scaling such as hardware limits, single point of failure, etc. while delaying the complexity and higher costs of horizontal scaling until necessary.

Advantages of Diagonal Scaling

Balances performance and flexibility: Diagonal scaling follows the natural lifecycle of a company. In the early stages, you scale up vertically, it’s cheap, fast and doesn’t require any code changes. In the growth stage, you add a load balancer and a second identical node to cope with increasing demand. This delays complex engineering (like database sharding) until you have the revenue to pay for the engineers to build it.
Helps optimise cost: Cloud providers like AWS or Azure have a ‘sweet spot’ for pricing. Tiny instances are often underpowered for the price, and the absolute largest instances are overpriced. With the Diagonal strategy, you scale up until you hit the sweet spot for pricing (best bang for the buck) and then scale horizontally using only that specific efficient instance type.
Reduced operational complexity: Managing a fleet of 1,000 micro-servers is complex compared to managing 10 large servers. With diagonal scaling, you have fewer individual machines to patch, monitor and secure compared to a pure horizontal setup. With fewer machines, you also have less hops between servers, leading to less network congestion.
Optimised licensing costs: Many software vendors (like Oracle or Microsoft SQL Server) charge per node or per server. If you have 50 small servers in your distributed architecture, you pay 50 licenses. By scaling vertically first, you might run 5 massive servers so you pay for 5 licences instead of 50, drastically reducing software costs while maintaining the same total computing power.
Higher performance ceiling: A single large database server with 1TB of RAM will almost always outperform a cluster of small servers for complex queries (like joins) because it avoids network latency. Once the server hits its limit, you don’t hit a wall (like in pure vertical scaling), you simply add a second 1TB server. You get the raw speed of vertical with the infinite scaling capabilities of horizontal.

Disadvantages of Diagonal Scaling

Higher stepping cost: In Horizontal scaling, if you need just 5% more power, you just add a tiny server which is quite cheap and the scaling is smooth and linear. In Diagonal scaling, the “base unit” is a large, expensive server. You can’t simply add a little bit of power; you must add another massive server. If you have 3 servers and need 5% more power, you’ll need to get an additional 4th server so your cost increases by 33% for a 5% gain in power.
High blast radius: Blast radius describes how much of the system goes down when a single node fails. In Horizontal scaling, if you have 100 nodes, if one fails, you lose 1% of your capacity. In a Diagonal setup, if you have 4 massive servers, if one fails, you lose 25% of your total capacity. The risk is that the remaining 3 servers might not be able to handle the sudden surge in traffic diverted to them due to the failed node, which might cause them to crash too. This is known as a “cascading failure”. In order to tackle this, you’d need to have a sophisticated failover mechanisms to handle the loss of a large node.
Increased complexity: While Diagonal scaling offers the best of both worlds from Horizontal scaling and Vertical scaling, it also has the worst of both worlds in terms of complexity. In such a setup, you’d need experts can tune hardware for vertical scaling and you’d also need experts that can manage load balancers, network latency and data synchronisation for Horizontal scaling, or experts that can do both.

Stateful Vs. Stateless Applications: Implications for scaling

The distinction between whether an application is stateful or stateless is crucial in deciding how and if you can scale. It dictates whether you can add more servers in a horizontal setup or if you’re forced to buy a bigger server in a vertical setup.

Stateless Applications

Stateless applications do not store session or user data on the server; each request contains all the information needed to be processed independently (e.g., WebAPIs and microservices). This setup makes it easy to scale as there is no dependency on prior interactions, allowing any node to process any request which is perfect for horizontal scaling.

Stateful Applications

Stateful applications retain session information between requests as they require context to function correctly (e.g., databases and auth servers). This makes it challenging to scale horizontally due to data consistency requirements; when a user is switched to a new server, that server has no idea who you are or what you were doing.

Impact on Horizontal Scaling

As we’ve covered in this post, horizontal scaling relies on the ability to distribute traffic randomly to a fleet of servers.

Stateless Applications make horizontal scaling smooth. The load balancer sends each user request randomly to a server in the fleet, which consumes the request and performs the action. If one server crashes, the load balancer sends subsequent requests to any working node. The user experiences zero interruptions.

Stateful Applications pose a “sticky session” problem, if a User logs into server 1, the load balancer must send that user to server 1 until the session has ends. This makes scaling hard as you cannot easily balance the load. If server 1 is overloaded but server 2 is empty, you cannot just move a user to server 2 or they’ll be logged out. Furthermore, if Server 1 crashes, the data of the users on that server is gone so they’ll be logged out and their work will be lost.

Impact on Vertical Scaling

Stateless applications can scale vertically, but it’s usually a waste of money. You don’t need a super powerful computer to handle simple, independent requests.

Stateful applications are great for vertical scaling. Since the application struggles to run across multiple machines, the easiest solution is to keep it on one machine and upgrade the machine to a more powerful configuration.

How to Scale Stateful Apps horizontally

A common approach to scaling stateful apps horizontally is to rearchitect the app to utilise an external state. Instead of storing the session in the server’s RAM, you store the session data in a shared, fast database like Redis.

The workflow for this would be:

User A sends a request to Server 1.
Server 1 puts the session data into Redis.
User A sends a request to Server 2.
Server 2 retrieves the session data from Redis.

The result of this is that the application servers become stateless and scalable while the state is managed by a specialised database.

The Complete Guide to System Scalability: Scale Up, Scale Out, or Go Diagonal?

On this page