[System Design Pulse #1] Understanding Latency and Throughput: Critical Factors in System Design and Performance Tuning

For your tech interviews With examples and code...

Aug 07, 2024

Latency and throughput are two fundamental concepts in computing and networking that play crucial roles in determining system performance.

I’ll delve into this as follows -

Introduction to Latency and Throughput

How It Works
Latency vs Throughput
How It Works with Application Example
- Example: Web Server
Step-by-Step Process Flow
Platform Example
- Platform Example: Online Shopping Website
Short Story Line: "The Busy Bakery"
- Story Summary
- Story Line
Latency vs Throughput Explained Using Specific Platforms
- Latency Platforms
  - Pingdom
  - Datadog
- Throughput Platforms
  - Grafana
  - Prometheus

Lets get started -

Latency refers to the time delay between the initiation of a request and the receipt of a response. It's essentially a measure of how long it takes for data to travel from its source to its destination. In simpler terms, latency is about speed and responsiveness.
Throughput, on the other hand, measures the amount of work or information flowing through a system in a given time period. It's often expressed as operations per second or data transferred per second. Throughput is about capacity and volume.

Github for System Design Interviews with Case Studies

Coming soon- ML System Design Series. Follow and subscribe today!

How It Works with Application Example

Example: Web Server

Scenario: A web server handles incoming HTTP requests from users and serves web pages.

Latency:
- Definition: The time taken for the server to respond to a single HTTP request. For example, if a user requests a web page, the latency is the time from when the user hits "Enter" until the page starts loading in their browser.
- Impact: Lower latency means faster page loads and a better user experience.
Throughput:
- Definition: The number of HTTP requests the server can handle per second. For example, if a server processes 500 requests per second, its throughput is 500 requests/sec.
- Impact: Higher throughput means the server can handle more users or requests simultaneously.

Step-by-Step Process Flow

User Request:
- A user sends a request to the server (e.g., accessing a webpage).
Request Arrival:
- The server receives the request.
Request Processing:
- The server processes the request:
  - Latency: Measure the time from receiving the request to sending a response.
  - Throughput: Measure how many requests the server processes in a given period.
Response Generation:
- The server generates a response based on the request.
Response Sent:
- The server sends the response back to the user.
Metrics Collection:
- Collect data on:
  - Latency: Time taken for individual requests.
  - Throughput: Number of requests handled per second.

Platform Example: Online Shopping Website

Scenario: An online shopping website needs to handle thousands of user requests during a flash sale.

Latency Considerations:
- Definition: How quickly can the website respond to a user’s actions? For instance, how fast can the website display search results after a user enters a query?
- Implementation: To reduce latency, the website might use a Content Delivery Network (CDN) to cache static resources (like images and CSS files) closer to the user. They may also use fast database queries and optimize server performance to minimize the time taken to process requests.
Throughput Considerations:
- Definition: How many requests can the website handle per second during peak times? For example, during a flash sale, the website must handle thousands of simultaneous requests for product searches, order placements, and payments.
- Implementation: To improve throughput, the website might use load balancing to distribute incoming traffic across multiple servers. They might also employ horizontal scaling, where additional servers are added to handle more traffic, and optimize their database to handle high query loads.

Example Implementation:

Before Flash Sale:
- Latency: Average response time is 200 ms.
- Throughput: The website can handle 1000 requests per second.
During Flash Sale:
- Latency: Response time might increase to 500 ms due to the high load. Measures like caching and database optimizations can help mitigate this.
- Throughput: The system scales up to handle 10,000 requests per second by adding more servers and optimizing server configuration.

Platform Strategies:

Caching: Implementing caching mechanisms to store frequently accessed data and reduce the time required to generate responses.
Load Balancing: Distributing incoming requests across multiple servers to prevent any single server from becoming a bottleneck.
Database Optimization: Using faster queries and database indexing to handle high query loads efficiently.
Horizontal Scaling: Adding more servers to handle increased load during peak times.

Simplified Version —

Imagine a bakery that bakes and sells cookies. During a special event, the bakery gets very busy with lots of cookie orders. The bakery’s goal is to serve as many customers as quickly as possible while making sure each cookie is perfect.

Characters

Baker Benny: The main baker who makes delicious cookies.
Customer Cathy: A customer who orders cookies.
Helper Hannah: Benny’s helper who assists with orders.

Story Line

The Bakery Opens:
- Baker Benny and Helper Hannah are ready to start baking cookies.
- Customer Cathy arrives and places an order for cookies.
Cookie Making:
- Baker Benny starts baking cookies for Cathy.
- Latency: This is how long it takes Benny to bake and pack Cathy’s cookies. If Benny bakes quickly, the latency is low, meaning Cathy gets her cookies faster.
Serving More Customers:
- As more customers arrive, Helper Hannah helps by taking orders and preparing more baking trays.
- Throughput: This is the number of cookie orders Benny and Hannah can handle in an hour. If they handle lots of orders quickly, the throughput is high.
Special Event Time:
- During a big event, lots of customers come in at once. Benny and Hannah need to make cookies fast to keep up with everyone’s orders.
- Latency might increase because it takes more time to bake each order.
- Throughput must be high to handle all the orders coming in.
Optimizing Cookie Making:
- To keep up with the high demand, Benny and Hannah use extra ovens and baking trays. They also organize the bakery to make the cookie-making process faster.
Happy Customers:
- By managing latency (making cookies quickly) and throughput (handling many orders), Benny and Hannah make sure all customers are happy and get their cookies as soon as possible.

1. Latency Platforms

1.1. Pingdom

How It Works:

Pingdom is a web performance monitoring service that helps track the latency of websites by measuring how long it takes for a web page to load from various locations around the world.

Application Example:

When monitoring a website, Pingdom sends requests to the website from multiple geographic locations. It measures the time taken for the entire web page to load from each location. The collected latency data is then used to identify slow-loading regions or issues with the website's performance.

Step-by-Step Process Flow:

Request Initiation:
- Pingdom initiates requests to the website from various locations.
Time Measurement:
- It measures how long it takes for the website to load fully from each location.
Data Collection:
- Latency data (time to load) is collected and averaged.
Reporting:
- Reports are generated showing average, minimum, and maximum latency values.
Analysis:
- Analyze the reports to identify slow regions and optimize the website accordingly.

1.2. Datadog

How It Works:

Datadog is an observability platform that monitors infrastructure performance, including latency for services, by collecting and analyzing various metrics from your applications and infrastructure.

Application Example:

Datadog tracks the latency of API responses or other service interactions within an application. It collects metrics on how long different services take to respond to requests and visualizes this data on dashboards.

Step-by-Step Process Flow:

Install Agents:
- Datadog agents are installed on servers or within the application.
Metric Collection:
- Metrics related to service response times (latency) are collected.
Data Aggregation:
- Collected data is aggregated and stored.
Dashboard Creation:
- Latency metrics are visualized on dashboards.
Alerting:
- Alerts are set up to notify when latency exceeds acceptable thresholds.

2. Throughput Platforms

2.1. Grafana

How It Works:

Grafana is an open-source platform for monitoring and observability that visualizes metrics from various data sources. Throughput is one of the metrics Grafana can display by integrating with data sources like Prometheus.

Application Example:

Grafana visualizes throughput data, such as the number of requests per second to a web service, by querying data sources like Prometheus. This helps in understanding how many requests the service handles over time.

Step-by-Step Process Flow:

Data Source Integration:
- Connect Grafana to data sources like Prometheus.
Query Metrics:
- Query throughput metrics such as requests per second.
Visualize Data:
- Display the throughput data on graphs and dashboards.
Analyze:
- Use visualizations to analyze throughput and identify performance issues.

Platform Usage Example: Online Video Streaming Service

Scenario: A streaming service needs to ensure it handles many viewers simultaneously (throughput) and provides a smooth viewing experience (latency).

Latency Monitoring with Pingdom and Datadog:
- Pingdom: Measures how long it takes for the streaming website to load fully from different locations. Helps identify if viewers are experiencing delays when accessing the service.
- Datadog: Monitors API response times for streaming data requests. Alerts if latency increases, indicating potential issues in the streaming pipeline.
Throughput Monitoring with Grafana and Prometheus:
- Prometheus: Collects metrics on the number of concurrent streams or requests per second. Tracks how many viewers are being served at any given moment.
- Grafana: Visualizes throughput data, such as the number of streams being handled. Helps in understanding if the system is meeting the demand and scaling effectively during peak times.

If you liked this article , like and share.

Thank you for reading Ignito. This post is public so feel free to share it.

Let’s get started with new system design case studies-

Design TrueCaller

What is True Caller
Important Features
Scaling Requirements — Capacity Estimation
Data Model — ER requirements
High Level Design
Basic Low level design
API Design
Complete Detailed Design

Design Stock exchange Design System

Design Distributed Cache

What is Distributed Cache
Important Features
Scaling Requirements — Capacity Estimation
Data Model — ER requirements
High Level Design
Basic Low Level Design
API Design
Complete Detailed Design
Complete Code Implementation

Design Twilio

What is Twilio
Important Features
Scaling Requirements — Capacity Estimation
Data Model — ER requirements
High Level Design
Basic Low Level Design
API Design
Complete Detailed Design

Design Google Docs

Design Doordash

Design Cache Mechanism

Design MS Docs

Design Zomato

What is Zomato
Important Features
Scaling Requirements — Capacity Estimation
Data Model — ER requirements
High Level Design
Basic Low Level Design
API Design
Complete Detailed Design
Complete Code Implementation

Design Facebook Newsfeed

Design Instagram

Design Tinder

Design Google Drive

Design Messenger App

Design Linkedin

What is Linkedin?
Important Features
Scaling Requirements — Capacity Estimation
Data Model — ER requirements
High Level Design
Basic Low Level Design
API Design
Complete Detailed Design
Complete Code

Design Whatspp

What is Whatsapp
Important Features
Scaling Requirements — Capacity Estimation
Data Model — ER requirements
High Level Design
Basic Low Level Design
API Design
Complete Detailed Design

Design Dropbox

Design Yelp

Design Amazon Prime Video

What is Amazon Prime Video
Important Features
Scaling Requirements — Capacity Estimation
Data Model — ER requirements
High Level Design
Basic Low Level Design
API Design
Complete Detailed Design

Design Web Crawler

Design API Rate Limiter

Design URL shortener

Design Bookmyshow

What is BookMyShow?
Important Features
Scaling Requirements — Capacity Estimation
Data Model — ER requirements
High Level Design
Basic Low Level Design
API Design
Complete Detailed Design

More system design case studies coming soon! Follow - Link

Things you must know in System Design -

System design basics : https://bit.ly/3SuUR0Y
Horizontal and vertical scaling : https://bit.ly/3slq5xh
Load balancing and Message queues: https://bit.ly/3sp0FP4
High level design and low level design, Consistent Hashing, Monolithic and Microservices architecture : https://bit.ly/3DnEfEm
Caching, Indexing, Proxies : https://bit.ly/3SvyVDc
Networking, How Browsers work, Content Network Delivery ( CDN) : https://bit.ly/3TOHQRb
Database Sharding, CAP Theorem, Database schema Design : https://bit.ly/3CZtfLN
Concurrency, API, Components + OOP + Abstraction : https://bit.ly/3sqQrhj
Estimation and Planning, Performance : https://bit.ly/3z9dSPN
Map Reduce, Patterns and Microservices : https://bit.ly/3zcsfmv
SQL vs NoSQL and Cloud : https://bit.ly/3z8Aa49

Data Science using Python

Pandas
Numpy
Advanced Pandas Techniques
Data Pre-processing
Handling missing values
Data Cleaning
Mean/mode/median Imputation
Hot Deck Imputation
Rescale Data
Binarize Data
Regression Imputation
Stochastic regression imputation
Feature Scaling
Data Augmentation
Read and Process Large Datasets
Data Visualization basics
Data Visualization Projects
Data Visualization using Plotly and Bokeh
Data Profiling
Summary Functions
Indexing
Grouping
Linear Regression
Multi Linear Regression
Polynomial Regression
Regression
Support Vector Regression
Decision Tree Regression
Random Forest Regression
Feature Engineering
GroupBy Features
Categorical and Numerical Features
Missing Value Analysis
Fill the missing Values
Unique Value Analysis
Univariate Analysis
Bivariate Analysis
Multivariate Analysis
Correlation Analysis
Spearman’s ρ
Pearson’s r
Kendall’s τ
Cramér’s V (φc)
Phik (φk)

Data Visualization

Data Visualization basics
Data Visualization Projects
Data Visualization using Plotly and Bokeh

Statistics

Random Variables
Statistical Inferences
Probability
Standard deviation and variance
Statistical Distributions
Hypothesis Testing
Normal distribution
t-distribution
Bernoulli distribution
confidence intervals

Data Collection and Data Cleaning

Data Collection
Data Cleaning

Data Manipulation

Join
Melt
Cut
Transform
Clean
Slicing
Reshaping
Filter
Group by
Pivot and Merge
Concatenate
MultiIndexing
Stacking
Hierarchical indexing
Aggregate
Summarize data

Linear Algebra for Machine Learning

Linear algebra concepts in Python
Matrix operations
Advanced linear algebra procedures

Supervised Learning

Regression

Supervised learning with probabilistic models
linear regression
Ordinary Least Squares
Linear Models
Linear and Quadratic Discriminant Analysis
Support Vector Machines
Stochastic Gradient Descent
Nearest Neighbors
Gaussian Processes
Cross decomposition
Naive Bayes
Decision Trees
Ensemble methods
Feature selection

Ridge Regression

Bias-variance tradeoff
Regression analysis

Bayesian Methods

Lagrange multipliers tool
sparse regression model
estimate covariants
Bayesian linear regression

Classification Algorithms

Classification using nearest neighbors
K-nearest neighbors
Bayes classifier
Supervised learning classification
perceptron algorithm

Logistic Regression

Kernel Methods
Gaussian Processes
kernel
kernelized perceptron

Support Vector Machines and Decision Trees

Hyperplanes with maximum margin method
SVM
decision tree-based classifiers
Grid search hyperparameters

Boosting and K-Means Clustering

Bagging and boosting techniques
Characteristics of K-means tools
Label encoder

Unsupervised Learning

Clustering Methods
K-means,
soft K-means
Gaussian mixture model

Principal Component Analysis and Markov Models

PCA basics
Implement PCA
Implement Markov chains using quantecon

Hidden Markov Models and Kalman Filtering

Hidden Markov Model
Markov models
Gaussian models
Forward/backward algorithm

Modeling

Model Training and Evaluation
Model Baselines
Model Tuning and Optimization
Model Review and governance
Automated Model retraining
Model Deployment and monitoring
Model Inference and Serving
Model Resource Management Techniques
Model Analysis
High-Performance Modeling

Model selection and evaluation

Cross-validation
Hyper-parameters Tuning
Performance Metrics
Validation curves

53 Implemented Projects

Complete System Design with Implemented Case Studies and Code

Link to Repo

Description - This repository contains everything you need to become proficient in System Design .

60 days of Data Science and ML with project Series

Link to Repo

Description - This repository contains everything you need to become proficient in Data Science and Machine Learning .

Complete Data Structures and Algorithms and System Design Series

Link to Repo

Description - This repository contains everything you need to become proficient in Data Structures and Algorithms

Complete Data Engineering with Projects Series

Link to Repo

Description - This repository contains everything you need to become proficient in Data Engineering

Tech-Interview : Important Topics and Techniques

Link to Repo

Description - This repository contains everything you need to become tech interview Ready with most important tips and techniques.

ML/AI Research Papers Solved

Link to Repo

Description - This repository contains everything you need to become proficient in ML/AI Research and Research Papers .

Complete ML Ops With Projects Series

Link to Repo

Description - This repository contains everything you need to become proficient in MLOps

Complete Advanced SQL Series

Link to Repo

Description - This repository contains everything you need to become proficient in Advanced SQL

Time Series Analysis and Forecasting

Link to Repo

Description - This repository contains everything you need to become proficient in Time Series Analysis and Forecasting

Complete Pytorch with Projects Series

Link to Repo

Description - This repository contains everything you need to become proficient in PyTorch

Complete Scikit learn with projects

Link to Repo

Description - This repository contains everything you need to become proficient in Scikit learn

[System Design Pulse #1] Understanding Latency and Throughput: Critical Factors in System Design and Performance Tuning

For your tech interviews With examples and code...

Introduction to Latency and Throughput

How It Works with Application Example

Example: Web Server

Step-by-Step Process Flow

Platform Example: Online Shopping Website

Platform Strategies:

Simplified Version —

Characters

Story Line

1. Latency Platforms

1.1. Pingdom

1.2. Datadog

2. Throughput Platforms

2.1. Grafana

Platform Usage Example: Online Video Streaming Service

Read more —

Things you must know in System Design -

Data Science using Python

Data Visualization

Statistics

Data Collection and Data Cleaning

Data Manipulation

Linear Algebra for Machine Learning

Supervised Learning

Classification Algorithms

Logistic Regression

Support Vector Machines and Decision Trees

Boosting and K-Means Clustering

Unsupervised Learning

Modeling

Model selection and evaluation

53 Implemented Projects

Let us know what do you think…

Happy learning!

Discussion about this post