[Most Asked : ML System Design Case Studies #25] Millions of Posts : How Linkedin Personalize Your Homepage Feed at Scale : How it Actually Works

Uncovering all the engineering insights and technical details..

Aug 14, 2025

∙ Paid

User Session Flow
Technical Architecture Deep Dive
Understanding the Core Personalization Problem
From Simple Feeds to AI-Powered Personalization
The Large-Scale Ranking (LaR) Architecture
Feature Engineering at Scale
Real-time Model Serving
A/B Testing
Engineering Insights & Conclusions
TL;DR

The Complete User Session Flow

Before diving into the neural networks and infrastructure, let me start with what matters most - what actually happens when you tap the LinkedIn app icon.

When software engineer Maya opens LinkedIn during her morning commute, a sophisticated orchestration of AI systems analyzes millions of data points to deliver a personalized feed of relevant content in under 800 milliseconds. Here's exactly what happens:

The Complete User Session Flow

┌─────────────────────────────────────────────────────────────┐
│                    USER SESSION FLOW                          │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│   1. APP LAUNCH & AUTHENTICATION                             │
│   ┌──────────────┐                                          │
│   │ User: Maya   │ → Taps LinkedIn app icon                 │
│   │ Device: iOS  │ → Biometric/cached auth check            │
│   │ Location: SF │ → Network connection established         │
│   └──────────────┘                                          │
│          ↓                                                   │
│                                                               │
│   2. USER CONTEXT LOADING (< 100ms)                         │
│   ┌──────────────────────────────────┐                      │
│   │ • User ID: maya_eng_2019         │                      │
│   │ • Profile: Senior SWE at Meta    │                      │
│   │ • Network: 2,847 connections     │                      │
│   │ • Interests: ML, Python, React   │                      │
│   │ • Last active: 6 hours ago       │                      │
│   └──────────────────────────────────┘                      │
│          ↓                                                   │
│                                                               │
│   3. CANDIDATE GENERATION (< 200ms)                         │
│   ┌──────────────────────────────────┐                      │
│   │ • Network posts: 1,247 new      │                      │
│   │ • Following posts: 892 new      │                      │
│   │ • Trending topics: 156 posts    │                      │
│   │ • Sponsored content: 45 ads     │                      │
│   │ • Recommendations: 234 posts    │                      │
│   │ Total candidates: 2,574 posts   │                      │
│   └──────────────────────────────────┘                      │
│          ↓                                                   │
│                                                               │
│   4. AI RANKING & SCORING (< 300ms)                         │
│   ┌──────────────────────────────────┐                      │
│   │ Feature extraction per post:     │                      │
│   │ • Author relevance: 0.89         │                      │
│   │ • Content type match: 0.76       │                      │
│   │ • Engagement prediction: 0.82    │                      │
│   │ • Recency factor: 0.94           │                      │
│   │ • Diversity score: 0.71          │                      │
│   │ Neural network ranking...        │                      │
│   └──────────────────────────────────┘                      │
│          ↓                                                   │
│                                                               │
│   5. FEED ASSEMBLY & OPTIMIZATION (< 150ms)                 │
│   ┌──────────────────────────────────┐                      │
│   │ Top 50 posts selected:           │                      │
│   │ 1. ML paper by Stanford prof     │                      │
│   │ 2. React best practices thread   │                      │
│   │ 3. Meta colleague's promotion    │                      │
│   │ 4. Python performance tips       │                      │
│   │ 5. Tech industry news...         │                      │
│   │ + Ads insertion (positions 3,8)  │                      │
│   └──────────────────────────────────┘                      │
│          ↓                                                   │
│                                                               │
│   6. CONTENT RENDERING & DELIVERY (< 50ms)                  │
│   ┌──────────────────────────────────┐                      │
│   │ • Image optimization for iOS     │                      │
│   │ • Text rendering preparation     │                      │
│   │ • Interaction handlers setup     │                      │
│   │ • Analytics tracking enabled     │                      │
│   │ • A/B test variants applied      │                      │
│   └──────────────────────────────────┘                      │
│                                                               │
│   Total Response Time: < 800ms 🚀                            │
└─────────────────────────────────────────────────────────────┘

Breaking Down Each Critical Step

Step 1: App Launch and Authentication (Instant) The moment Maya taps the LinkedIn app, the client begins establishing connection to LinkedIn's edge servers. Since she used biometric authentication yesterday, her session token is still valid and cached locally. The app immediately identifies her device, location (San Francisco), and network conditions. LinkedIn's global CDN routes her to the nearest data center (likely AWS us-west-1) to minimize latency. The app also loads her basic profile cache stored locally from the previous session.

Read Implemented LLMs System Design (recommended to complete previous parts) -

Understanding Transformers & Large Language Models: How They Actually Work - Part 1

Understanding Transformers & Large Language Models: How They Actually Work - Part 2

[Launching LLM System Design ] Large Language Models: From Tokens to Optimization: How They Actually Work - Part 1

[Launching LLM System Design #2] Large Language Models: From Architecture, Attention, and Fine-Tuning: How They Actually Work - Part 2

[LLM System Design #3] Large Language Models: Pre-Training LLMs: How They Actually Work - Part 3

[Important LLM System Design #4] Heart of Large Language Models: Encoder and Decoder: How They Actually Work - Part 4

Step 2: User Context Loading (< 100ms) With Maya authenticated, the system rapidly assembles her user context from multiple data stores. Her user profile service returns her role (Senior Software Engineer at Google), her social graph service loads her 2,847 connections, and her interest inference service provides her top interests derived from past interactions: Machine Learning, Python, React, startup culture, and tech industry news. The system also notes she was last active 6 hours ago, suggesting she'll want to catch up on what she missed overnight.

Step 3: Candidate Generation (< 200ms) Now comes the first major AI component. LinkedIn doesn't just show Maya every possible post - that would be millions of pieces of content. Instead, multiple candidate generation systems run in parallel. The "network posts" generator finds 1,247 new posts from her direct connections. The "following" generator identifies 892 posts from pages and influencers she follows. The "trending topics" system, which analyzes viral content across the platform, suggests 156 relevant trending posts. The ads targeting system identifies 45 sponsored posts that match her profile. Finally, the recommendation system suggests 234 posts from outside her immediate network but aligned with her interests. This creates a candidate pool of 2,574 potential posts.

Step 4: AI Ranking and Scoring (< 300ms) This is where LinkedIn's Large-scale Ranking (LaR) neural network model works its magic. Each of the 2,574 candidate posts gets transformed into a feature vector with hundreds of signals. For a machine learning paper posted by a Stanford professor, the system calculates: author relevance score (0.89 - Maya follows ML experts), content type match (0.76 - she engages with technical papers), predicted engagement probability (0.82 - likely to read and comment), recency factor (0.94 - posted 2 hours ago), and diversity score (0.71 - fits her content mix). The neural network, trained on billions of user interactions, processes all these features simultaneously and outputs a relevance score for each post.

Below are the top 10 System Design Case studies for this week

[Launching-ML System Design Tech Case Study Pulse #2] Million Of House Prices in Predicted Accurately in Real Time : How Zillow Actually Works

[ML System Design Tech Case Study Pulse #4 : Top Question] Predict Real-time Store Status to Billions of Users Worldwide: How Google Maps Actually Work

[ML System Design Tech Case Study Pulse #3 : Top Question] Recommending Million Of Items to Millions of Customer in Real Time: How Amazon Recommendation Actually Works

[Launching-ML System Design Tech Case Study Pulse #1]Handling Billions of Transaction Daily : How Amazon Efficiently Prevents Fraudulent Transactions (How it Actually Works)

Billions of Queries Daily : How Google Search Actually Works

100+ Million Requests per Second : How Amazon Shopping Cart Actually Works

Serving 132+ Million Users : Scaling for Global Transit Real Time Ride Sharing Market at Uber

3 Billion Daily Users : How Youtube Actually Scales

$100000 per BTC : How Bitcoin Actually Works

$320 Billion Crypto Transactions Volume: How Coinbase Actually Works

100K Events per Second : How Uber Real-Time Surge Pricing Actually Works

Processing 2 Billion Daily Queries : How Facebook Graph Search Actually Works

7 Trillion Messages Daily : Magic Behind LinkedIn Architecture and How It Actually Works

1 Billion Tweets Daily : Magic Behind Twitter Scaling and How It Actually Works

12 Million Daily Users: Inside Slack's Real-Time Messaging Magic and How it Actually Works

3 Billion Daily Users : How Youtube Actually Scales

1.5 Billion Swipes per Day : How Tinder Matching Actually Works

500+ Million Users Daily : How Instagram Stories Actually Work

2.9 Billion Daily Active Users : How Facebook News Feed Algorithm Actually Works

20 Billion Messages Daily: How Facebook Messenger Actually Works

8+ Billion Daily Views: How Facebook's Live Video Ranking Algorithm Works

How Discord's Real-Time Chat Scales to 200+ Million Users

80 Million Photos Daily : How Instagram Achieves Real Time Photo Sharing

Serving 1 Trillion Edges in Social Graph with 1ms Read Times : How Facebook TAO works

How Lyft Handles 2x Traffic Spikes during Peak Hours with Auto scaling Infrastructure..

Step 5: Feed Assembly and Optimization (< 150ms) The ranking model produces scores for all candidates, but Maya's feed needs careful curation. LinkedIn's feed assembly algorithm selects the top 50 posts but doesn't just rank by score. It applies diversity constraints (no more than 3 posts from the same person), freshness requirements (at least 30% from the last 24 hours), and content type mixing (balancing articles, videos, and text posts). It also strategically inserts sponsored content at positions 3 and 8, where engagement rates are optimal but user experience isn't degraded. The ML paper from Stanford lands at position 1, followed by a React best practices thread that matches her recent JavaScript interactions.

Ignito

[Most Asked : ML System Design Case Studies #25] Millions of Posts : How Linkedin Personalize Your Homepage Feed at Scale : How it Actually Works

Uncovering all the engineering insights and technical details..

Table of Contents

The Complete User Session Flow

The Complete User Session Flow

Breaking Down Each Critical Step

Read Implemented LLMs System Design (recommended to complete previous parts) -

This post is for paid subscribers