[Most Important LLM System Design #9] Understanding LLMs: Breaking down Neural Network: How They Actually Work - Part 9
All the technical details you need to know...
Why Neural Networks Matter More Than Ever
Imagine teaching a computer to recognize your face in a crowd of thousands, translate languages in real-time, or even compose music that moves people to tears. This isn't science fiction—it's the reality of neural networks today.
Neural networks have become the backbone of artificial intelligence, powering everything from the recommendation algorithms on Netflix to the autonomous vehicles navigating our streets. But what exactly are these mysterious "networks," and how do they work their magic?
Read LLMs System Design (recommended to complete previous parts) -
Understanding Transformers & Large Language Models: How They Actually Work - Part 1
Understanding Transformers & Large Language Models: How They Actually Work - Part 2
[LLM System Design #3] Large Language Models: Pre-Training LLMs: How They Actually Work - Part 3
1. The Basic Building Block: The Artificial Neuron
What is an Artificial Neuron?
An artificial neuron, also called a perceptron, is the fundamental unit of a neural network—inspired by biological neurons in our brains. Just as biological neurons receive signals, process them, and fire when stimulated enough, artificial neurons receive numerical inputs, apply mathematical operations, and produce outputs.
Core Components of a Neuron:
Inputs (x₁, x₂, ..., xₙ): The data fed into the neuron
Weights (w₁, w₂, ..., wₙ): Values that determine the importance of each input
Bias (b): An additional parameter that shifts the activation threshold
Activation Function: A mathematical function that determines the neuron's output
Output (y): The final result after processing
Mathematical Foundation
The neuron performs this calculation:
z = (x₁ × w₁) + (x₂ × w₂) + ... + (xₙ × wₙ) + b
y = activation_function(z)
Diagram 1: Single Neuron Architecture
x₁ ──(w₁)──┐
x₂ ──(w₂)──┤
x₃ ──(w₃)──┤ Σ ──> f(z) ──> y
... │
xₙ ──(wₙ)──┘
bias(b) ───┘
Legend:
- Inputs (x) flow through weighted connections (w)
- Summed at Σ with bias (b)
- Passed through activation function f(z)
- Produces output (y)
Diagram 2: Weight Learning Process
Before Learning: After Learning:
Input: Email Features Input: Same Email Features
x₁ = 5 (exclamation marks) x₁ = 5 (exclamation marks)
x₂ = 1 (has "FREE") x₂ = 1 (has "FREE")
x₃ = 0.2 (short length) x₃ = 0.2 (short length)
Weights: w₁=0.1, w₂=0.1, w₃=0.1 Weights: w₁=0.8, w₂=0.9, w₃=0.3
Bias: b=0 Bias: b=-0.5
Calculation: Calculation:
z = (5×0.1)+(1×0.1)+(0.2×0.1)+0 z = (5×0.8)+(1×0.9)+(0.2×0.3)-0.5
z = 0.62 z = 4.36
Output = σ(0.62) = 0.65 (Maybe) Output = σ(4.36) = 0.99 (SPAM!)
The neuron learned to give higher weights to spam indicators!
Diagram 3: Biological vs Artificial Neuron Comparison
BIOLOGICAL NEURON: ARTIFICIAL NEURON:
Dendrites ──┐ Inputs (x₁,x₂,x₃) ──┐
(inputs) │ (numerical values) │
▶ Cell Body ──▶ Axon ────────────────────▶ Σ ──▶ f(z) ──▶ Output
Synapses ──┘ (processing) (output) │ (sum) (activation) (y)
(weights) │
Weights & Bias ──┘
Similarities:
• Both receive multiple inputs
• Both have processing mechanisms
• Both produce outputs based on input strength
• Both can "learn" by adjusting connection strengths
Key Differences:
• Biological: Electrochemical signals, analog
• Artificial: Mathematical operations, digital
Diagram 4: Step-by-Step Neuron Calculation
Step 1: Input Reception Step 2: Weighted Sum
┌─────────────────┐ ┌──────────────────┐
│ x₁ = 0.8 (age) │ │ w₁×x₁ = 0.5×0.8 │
│ x₂ = 0.6 (income)│ ────────▶ │ w₂×x₂ = 0.3×0.6 │ = z
│ x₃ = 0.9 (credit)│ │ w₃×x₃ = 0.7×0.9 │
└─────────────────┘ │ bias = 0.1 │
└──────────────────┘
z = 0.4 + 0.18 + 0.63 + 0.1 = 1.31
Step 3: Activation Function Step 4: Final Output
┌──────────────────┐ ┌─────────────────┐
│ f(z) = σ(1.31) │ │ y = 0.79 │
│ σ(z) = 1/(1+e^-z)│ ────────▶ │ Interpretation: │
│ σ(1.31) = 0.79 │ │ 79% chance of │
└──────────────────┘ │ loan approval │
└─────────────────┘
Why Artificial Neurons?
The genius of artificial neurons lies in their simplicity and power:
Learning Capability: By adjusting weights and biases, neurons can learn patterns from data
Non-linearity: Activation functions introduce non-linear behavior, enabling complex pattern recognition
Scalability: Simple neurons can be combined to solve incredibly complex problems
Biological Inspiration: Mimicking brain structure provides intuitive understanding
When and How to Use Single Neurons
Use Cases:
Linear Classification: Separating data into two categories
Simple Regression: Predicting continuous values
Feature Detection: Identifying specific patterns in data
Example Implementation Scenario: Imagine building a spam email detector. A single neuron could take inputs like:
Number of exclamation marks
Presence of words like "FREE" or "URGENT"
Email length
Sender reputation score
The neuron learns optimal weights for each feature to classify emails as spam or not spam.
2. Activation Functions: The Decision Makers
What are Activation Functions?
Activation functions are mathematical functions that determine whether a neuron should be activated (fire) or not. They introduce non-linearity into the network, enabling it to learn complex patterns that linear functions cannot capture.
Below are the top 10 System Design Case studies for this week
Billions of Queries Daily : How Google Search Actually Works
100+ Million Requests per Second : How Amazon Shopping Cart Actually Works
Serving 132+ Million Users : Scaling for Global Transit Real Time Ride Sharing Market at Uber
3 Billion Daily Users : How Youtube Actually Scales
$100000 per BTC : How Bitcoin Actually Works
$320 Billion Crypto Transactions Volume: How Coinbase Actually Works
100K Events per Second : How Uber Real-Time Surge Pricing Actually Works
Processing 2 Billion Daily Queries : How Facebook Graph Search Actually Works
7 Trillion Messages Daily : Magic Behind LinkedIn Architecture and How It Actually Works
1 Billion Tweets Daily : Magic Behind Twitter Scaling and How It Actually Works
12 Million Daily Users: Inside Slack's Real-Time Messaging Magic and How it Actually Works
3 Billion Daily Users : How Youtube Actually Scales
1.5 Billion Swipes per Day : How Tinder Matching Actually Works
500+ Million Users Daily : How Instagram Stories Actually Work
2.9 Billion Daily Active Users : How Facebook News Feed Algorithm Actually Works
20 Billion Messages Daily: How Facebook Messenger Actually Works
8+ Billion Daily Views: How Facebook's Live Video Ranking Algorithm Works
How Discord's Real-Time Chat Scales to 200+ Million Users
80 Million Photos Daily : How Instagram Achieves Real Time Photo Sharing
Serving 1 Trillion Edges in Social Graph with 1ms Read Times : How Facebook TAO works
How Lyft Handles 2x Traffic Spikes during Peak Hours with Auto scaling Infrastructure..
Without activation functions, no matter how many layers you stack, your neural network would behave like a single linear function—severely limiting its power.
Most Important Activation Functions and how to use them
2.1 Sigmoid Function
σ(z) = 1 / (1 + e^(-z))
Range: (0, 1)
Shape: S-shaped curve
Properties: Smooth, differentiable, outputs probabilities