System Design Primer

A GitHub repository by Donne Martin, a tech lead @Facebook/Meta. The repository a guide on how to design large-scale systems, and prepare for a system and object-oriented design interviews.

In this section, I will summarize what I have learned so far.
Link to the GitHub Repository:

1. Prerequisite knowledge

Basic understanding of common system design principles, learning about what they are, how they are used, and their pros and cons.

1.1 CS75 (Summer 2012) Lecture 9 - Scalability Hardvard Web Development

Web hosting service

is a type of Internet Hosting Service (a service that runs servers on the Internet) that hosts websites for clients.

A good web hosting service should ensure security (encrypted usernames and passwords).
The alternative is VPS, which runs multiple virtual machines in the physical machine.

Vertical Scaling

If low in resources, get more RAM and processors as a solution (CPU, Disk, RAM).

It is not necessarily a complete solution because of real-world constraints.

Horizontal Scaling

Using multiple servers to scale the system instead of upgrading one server.

Load Balancer

The traffic is distributed to various backend servers. We return the public IP address of the load balancer to clients, and backend servers can have private IP addresses.

The difference between public and private IP addresses is that private IP addresses cannot be seen on the internet, thus more security and reducing cost by using IPv4.
Sending the request to a server based on
- load (complex)
- dedicated servers (less complex)
- round-robin, a.k .a. enumerating IP addresses, and it returns different IP addresses for each request (reasonable!). Problems: random requests to servers (could see heavy weight in one server)
Software: ELB, HAProxy, LVS
Hardware: Barracuda, Cisco, Citrix

Shared Session State

Sessions tend to be specifically stay in one computer, implemented per server.

Factor out session states out of hardware and create one session computer connected to all servers to put sessions in load balancers => Weakness in network topology if load balancers or databases die => USING RAID (redundant array of inexpensive disks)
RAID is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy
RAID 0 - performance by striping data
RAID 1 - mirroring data to two hardware, fault-tolerant
RAID is fault-tolerant, avoids data loss and downtime

Shared Storage

FC, iSCSI, MySQL, NFS, etc.

Session Affinity

Sticky session => ending up in the same backend server
Cookies => finite size, not store IP in a cookie because of security and IPs can change => put a random big number that load balancers can translate into real private IPs (we do this to remember what backend server to send to)

Data replication

Solves single point of failure. Making automatic copies, usually from master database to slave databases

Paradigms:

Master-Slave - balance read requests and fault tolerant
Masters-Master - more redundancy

Partitioning

It is a horizontal scaling, and requests do not have to be balanced based on load or using a round-robin but using someone's last name.

High-availability

If one server fails, we can use the backup server.

Data center redundancy

If one data center fails, we can use the backup server on another data center. You can do load balancing based on geo-location.

Security

from the internet TPC port 80, 443 (SSL/TSL certificates), SSH port 22 to load balancers
from load balancers with TCP 80 and unencrypted (you can do SSL termination in load balancers) to web servers
from web servers with TCP 3306 to databases

1.2 Scalability for Dummies

Clones

Every server contains the same codebase and does not store user-related data, like sessions or profile pictures, on a local disc or memory.

Sessions must be stored in a centralized data store, accessible to all your application servers. It can be an external database or an external persistent cache, like Redis. An external persistent cache will have better performance than an external database. By external, I mean that the data store does not reside on the application servers. Instead, it is somewhere in or near the data center of your application servers.
After outsourcing the sessions and serving the same codebase from all servers => we can create containers or images.

Database

The scenario: Having a MySQL database, and somewhere down the road, the application gets slower and slower and finally breaks down. There are two solutions:

Keep the database running, and the database administrator (DBA) will do master-slave replication and upgrade the master server by continuously adding more RAM.

Master-slave architecture helps in balancing the load. The master database is used for the write operations, while read operations may be spread on multiple slave databases.

Denormalize the database from the beginning and include no more Joins in any database query. We can stay with MySQL and use it like a NoSQL database or switch to a better and easier-to-scale NoSQL database like MongoDB. Joins will now need to be done with the application code. Sooner or later, database requests will again be slower and slower. We will then need to introduce a cache.

Cache

With "cache," we always mean in-memory caches like Memcached or Redis. Never do file-based caching.

A cache is a simple key-value store that is a buffering layer between the application and the data storage. Whenever your application has to read to read data, it should first try to retrieve the data from the cache.

Cached Database Queries - storing the result dataset in cache; the hashed version of the query is the hash key. The next time running the query, you check the first to see if there is a result.

This pattern has an issue: Deleting a cached result with a complex query is hard.

Cached Objects - seeing the data as an object (classes, instances, etc.) -> storing the complete instance of the class or the assembled dataset in the cache.

This allows us to eliminate objects quickly whenever something changes, making the operations faster and more logical.
Additionally, this makes asynchronous operation possible.

Asynchronism

To avoid such a "please wait a while", they are two ways/paradigms asynchronism can be done.

In the context of the web app, we can do time-consuming work in advance and serving the finished work with a low request time.

E.g., turning dynamic content into static content, meaning that pages are pre-rendered and locally stored as static HTML. We can use those and upload to AWS S3 or Cloudfront or another CDN.

In the context of the web service, we can do tasks asynchronously.

A very computing-intensive task is sent into a job queue, which is constantly checked by workers for new jobs. If there is a new job, then the worker does the job and sends back a signal that it is done.

2. Performance vs. Scalability

A service is scalable if it results in increased performance in a manner proportional to resources added. Generally, increasing performance means serving more units of work, but it can also be to handle larger units of work, such as database grow.

Another way to look at performance vs. scalability:

If you have a performance problem, your system is slow for a single user.
If you have a scalability problem, your system is fast for a single user but slow under heavy load.

3. Latency vs. Throughput

Latency is the time to perform some action or to produce some result.
Throughput is the number of such actions or results per unit of time.
Generally, you should aim for maximal throughput with acceptable latency.

4. Availability vs. Consistency

4.1 CAP Theorem

In a distributed computer system, you can only support two of the following guarantees:

Consistency - Every read receives the most recent write or an error
Availability - Every request receives a response, without guarantee that it contains the most recent version of the information
Partition Tolerance - The system continues to operate despite arbitrary partitioning due to network failures

Networks aren't reliable, so you'll need to support partition tolerance. You'll need to make a software tradeoff between consistency and availability.

4.1.1 CP - Consistency and Partition Tolerance

Waiting for a response from the partitioned node might result in a timeout error. CP is a good choice if your business needs require atomic reads and writes.

(Atomicity ensures that each transaction is treated as a single unit, which either completely succeeds or completely fails.

In CP, atomic reads and writes ensures data consistency across a distributed system. In the event of a network partition (where parts of the system cannot communicate with each other), a CP system will maintain data consistency, even if it means sacrificing availability (resulting in timeouts or errors for some operations)

4.1.2 AP - Availability and Partition Tolerance

Responses return the most readily available version of the data available on any node, which might not be the latest. Writes might take some time to propagate when the partition is resolved.

AP is a good choice if the business needs to allow for eventual consistency or when the system needs to continue working despite external errors.

4.2 Consistency Patterns

With multiple copies of the same data, we are faced with options on how to synchronize them so clients have a consistent view of the data. Every read receives the most recent write or an error.

4.2.1 Weak Consistency

After a write, reads may or may not see it. A best-effort approach is taken.

This approach is seen in systems such as memcached. Weak consistency works well in real-time use cases.

E.g., if you are on a phone call and lose reception for a few seconds, when you regain connection you do not hear what was spoken during connection loss.

4.2.2 Eventual Consistency

After a write, reads will eventually see it (typically within milliseconds). Data is replicated asynchronously.

This approach is seen in systems such as DNS and email. Eventual consistency works well in highly available systems.

4.2.3 Strong Consistency

After a write, reads will see it. Data is replicated synchronously.

This approach is seen in file systems and RDBMSes. Strong consistency works well in systems that need transactions.

4.3 Availability Patterns

Standby Modes

Hot standby: The passive server is not running; it may need to boot up and load data before it can take over, leading to longer downtime.
Cold standby: The passive server is not running; it may need to boot up and load data before it can take over, leading to longer downtime.

4.3.1 Fail-over

Active-passive

With active-passive fail-over, heartbeats are sent between the active and the passive server on standby. If the heartbeat is interrupted, the passive server takes over the active IP address and resumes service. Meaning that the passive server "takes over" the responsibilities of the active server. Since the active server's IP address is what clients use to connect to the service, by taking over the IP address, the passive server ensures that clients can continue to connect to the service without any changes on their part (this process helps minimize the disruption of service).
The length of downtime is determined by whether the passive server is already running in 'hot standby' or whether it needs to start up from 'cold' standby.

Active-active

In active-active, both servers are managing traffic, spreading the load between them.
If the servers are public-facing, the DNS would need to know about the public IPs of both servers. If the servers are internally facing, application logic would need to know about both servers.
Active-active failover can also be referred to as master-master failover.

Disadvantage(s): failover

Fail-over adds more hardware and additional complexity.
There is a potential for loss of data if the active system fails before any newly written data can be replicated in the passive.

4.3.2 Replication

Master-slave and master-master

4.4 Availability in numbers

Availability is often quantified by uptime (or downtime) as a percentage of time the service is available. Availability is generally measured in a number of 9s - a service with 99.99% availability is described as having four 9s.

99.9% availability - three 9s

Duration

Acceptable downtime

Downtime per year

8h 45min 57s

Downtime per month

43m 49.7s

Downtime per week

10m 4.8s

Downtime per day

1m 26.4s

99.99% availability - four 9s

Duration

Acceptable downtime

Downtime per year

52min 35.7s

Downtime per month

4m 23s

Downtime per week

1m 5s

Downtime per day

8.6s

Availability in parallel vs. in sequence

If a service consists of multiple components prone to failure, the service's overall availability depends on whether the components are in sequence or in parallel.

In sequence
---------------------------------------------------------------------------------
- overall availability decreases when two components with availability < 100% are
in sequence
- this setup means that a request or operation must pass through one component
to reach the next
- if any components fails, the whole system or service becomes
unavailable

Availability (Total) = Availability (Foo) * Availability (Bar)
    if both Foo and Bar each had 99.9% availability, their total availability
    in sequence would be 99.8%

In parallel
---------------------------------------------------------------------------------
- overall availability increases when two components with availability < 100%
are in parallel
- this setup increases the overall system availability because the failure of one
component doesn't necessarily lead to the failure of the entire system
- this means that the parallel arrangement significantly increases the system's
overall availability. Even if one component fails, the system can still function,
relying on the other components

Availability (Total) = 1 - (1 - Availability (Foo)) * (1 - Availability (Bar))
    if both Foo and Bar each had 99.9% availability, their total availability
    in parallel would be 99.9999%

5. Domain Name System (DNS)

A Domain Name System (DNS) translates a domain name such as "www.example.com" to an IP address. Routers or ISP provide information about which DNS server(s) to contact when doing a lookup. Lower level DNS servers cache mappings, which could become stale due to DNS propagation delays.

NS record (name server) - specifies the DNS servers for your domain/subdomain
MX record (mail exchange) - specifies the mail servers for accepting messages
A record (address) - points to a name to an IP address
CNAME (canonical) - points a name to another name or CNAME (example.com to www.example.com) or to an A record

Services such as CloudFlare and Route 53 provide managed DNS services. Some DNS services can route traffic through various methods:

Weighted round robin
Latency-based
Geolocation-based

Disadvantanges(s): DNS

Accessing a DNS server introduces a slight delay, although mitigated by caching described above
DNS server management could be complex and is generally managed by governments, ISPs, and large companies
DNS services have recently come under DDoS attack, preventing users from accessing websites such as Twitter without knowing Twitter's IP address(es)

6. Content Delivery Network

The longer a page loads after a user requests, it less likely that customers buy from you or even stay on the page.

A one-second delay in page load time yields: 11% fewer page views, a 16% decrease in customer satisfaction, 7% loss in conversions.

This annoying delay is called latency, a point where all website drop-offs take place. Therefore, CDN was invented to shorten the physical distance between the website user and the website's hosting server.

A content delivery network (CDN) is a globally distributed network of proxy servers, serving content from locations closer to the user. Generally, static files such as HTML/CSS/JS, photos, and videos are served from CDN. The site's DNS resolution will tell clients which server to contact.

Serving content from CDNs can significantly improve performance in two ways:

Users receive content from data centers close to them
Your servers do not have to serve requests that the CDN fulfills

Push CDNs

Push CDNs receive new content whenever changes occur on your server. You take full responsibility for providing content, uploading directly to the CDN, and rewriting URLs to point to the CDN. You can configure when content expires and when it is updated. Content is uploaded only when it is new or changed, minimizing traffic, but maximizing storage.

Sites with a small amount of traffic or sites with content that isn't often updated work well with push CDNs. Content is placed on the CDNs once, instead of being re-pulled at regular intervals.

Pull CDNs

Pull CDNs grab new content from your server when the first user requests the content. You leave the content on your server and rewrite URLs to point to the CDN. This results in a slower request until the content is cached on the CDN.

A time-to-live (TTL) determines how long content is cached. Pull CDNs minimize storage space on the CDN, but can create redundant traffic if files expire and are pulled before they have actually changed.

Summary

Push CDNs require manual management and are best for content that doesn't change often or for sites with lower traffic. They emphasize storage over bandwidth, as content stays on the CDN until explicitly updated or removed.
Pull CDNs automate the caching process, pulling content from the origin server as needed. This approach is efficient for sites with high traffic or frequently updated content. It prioritizes reducing the load on the origin server, potentially at the cost of increased bandwidth usage due to the CDN re-pulling expired content.

Disadvantage(s): CDN

CDN costs could be significant depending on traffic, although this should be weighed with additional costs you would incur not using a CDN
Content might be stale if it is updated before the TTL expires it
CDNs require changing URLs for static content to point to the CDN

PreviousMindsDB NextMore System Design Learnings

Last updated 1 year ago