[Most Important LLM System Design #9] Understanding LLMs: Breaking down Neural Network: How They Actually Work - Part 9

All the technical details you need to know...

Aug 06, 2025

∙ Paid

Why Neural Networks Matter More Than Ever

Imagine teaching a computer to recognize your face in a crowd of thousands, translate languages in real-time, or even compose music that moves people to tears. This isn't science fiction—it's the reality of neural networks today.

Neural networks have become the backbone of artificial intelligence, powering everything from the recommendation algorithms on Netflix to the autonomous vehicles navigating our streets. But what exactly are these mysterious "networks," and how do they work their magic?

Read LLMs System Design (recommended to complete previous parts) -

Understanding Transformers & Large Language Models: How They Actually Work - Part 1

Understanding Transformers & Large Language Models: How They Actually Work - Part 2

[Launching LLM System Design ] Large Language Models: From Tokens to Optimization: How They Actually Work - Part 1

[Launching LLM System Design #2] Large Language Models: From Architecture, Attention, and Fine-Tuning: How They Actually Work - Part 2

[LLM System Design #3] Large Language Models: Pre-Training LLMs: How They Actually Work - Part 3

[Important LLM System Design #4] Heart of Large Language Models: Encoder and Decoder: How They Actually Work - Part 4

1. The Basic Building Block: The Artificial Neuron

What is an Artificial Neuron?

An artificial neuron, also called a perceptron, is the fundamental unit of a neural network—inspired by biological neurons in our brains. Just as biological neurons receive signals, process them, and fire when stimulated enough, artificial neurons receive numerical inputs, apply mathematical operations, and produce outputs.

Core Components of a Neuron:

Inputs (x₁, x₂, ..., xₙ): The data fed into the neuron
Weights (w₁, w₂, ..., wₙ): Values that determine the importance of each input
Bias (b): An additional parameter that shifts the activation threshold
Activation Function: A mathematical function that determines the neuron's output
Output (y): The final result after processing

Mathematical Foundation

The neuron performs this calculation:

z = (x₁ × w₁) + (x₂ × w₂) + ... + (xₙ × wₙ) + b
y = activation_function(z)

Diagram 1: Single Neuron Architecture

    x₁ ──(w₁)──┐
    x₂ ──(w₂)──┤
    x₃ ──(w₃)──┤ Σ ──> f(z) ──> y
    ...        │
    xₙ ──(wₙ)──┘
    bias(b) ───┘
    
Legend:
- Inputs (x) flow through weighted connections (w)
- Summed at Σ with bias (b)
- Passed through activation function f(z)
- Produces output (y)

Diagram 2: Weight Learning Process

Before Learning:              After Learning:
Input: Email Features         Input: Same Email Features
x₁ = 5 (exclamation marks)    x₁ = 5 (exclamation marks)
x₂ = 1 (has "FREE")          x₂ = 1 (has "FREE") 
x₃ = 0.2 (short length)      x₃ = 0.2 (short length)

Weights: w₁=0.1, w₂=0.1, w₃=0.1    Weights: w₁=0.8, w₂=0.9, w₃=0.3
Bias: b=0                          Bias: b=-0.5

Calculation:                       Calculation:
z = (5×0.1)+(1×0.1)+(0.2×0.1)+0   z = (5×0.8)+(1×0.9)+(0.2×0.3)-0.5
z = 0.62                          z = 4.36
Output = σ(0.62) = 0.65 (Maybe)   Output = σ(4.36) = 0.99 (SPAM!)

The neuron learned to give higher weights to spam indicators!

Diagram 3: Biological vs Artificial Neuron Comparison

BIOLOGICAL NEURON:                 ARTIFICIAL NEURON:
                                  
Dendrites ──┐                     Inputs (x₁,x₂,x₃) ──┐
(inputs)    │                     (numerical values)   │
            ▶ Cell Body ──▶ Axon   ────────────────────▶ Σ ──▶ f(z) ──▶ Output
Synapses ──┘  (processing) (output)                    │  (sum) (activation) (y)
(weights)                                              │
                                                       Weights & Bias ──┘

Similarities:
• Both receive multiple inputs
• Both have processing mechanisms  
• Both produce outputs based on input strength
• Both can "learn" by adjusting connection strengths

Key Differences:
• Biological: Electrochemical signals, analog
• Artificial: Mathematical operations, digital

Diagram 4: Step-by-Step Neuron Calculation

Step 1: Input Reception           Step 2: Weighted Sum
┌─────────────────┐              ┌──────────────────┐
│ x₁ = 0.8 (age)  │              │ w₁×x₁ = 0.5×0.8  │
│ x₂ = 0.6 (income)│  ────────▶  │ w₂×x₂ = 0.3×0.6  │ = z
│ x₃ = 0.9 (credit)│              │ w₃×x₃ = 0.7×0.9  │
└─────────────────┘              │ bias = 0.1       │
                                 └──────────────────┘
                                 z = 0.4 + 0.18 + 0.63 + 0.1 = 1.31

Step 3: Activation Function       Step 4: Final Output
┌──────────────────┐              ┌─────────────────┐
│ f(z) = σ(1.31)   │              │ y = 0.79        │
│ σ(z) = 1/(1+e^-z)│  ────────▶  │ Interpretation: │
│ σ(1.31) = 0.79   │              │ 79% chance of   │
└──────────────────┘              │ loan approval   │
                                 └─────────────────┘

Why Artificial Neurons?

The genius of artificial neurons lies in their simplicity and power:

Learning Capability: By adjusting weights and biases, neurons can learn patterns from data
Non-linearity: Activation functions introduce non-linear behavior, enabling complex pattern recognition
Scalability: Simple neurons can be combined to solve incredibly complex problems
Biological Inspiration: Mimicking brain structure provides intuitive understanding

When and How to Use Single Neurons

Use Cases:

Linear Classification: Separating data into two categories
Simple Regression: Predicting continuous values
Feature Detection: Identifying specific patterns in data

Example Implementation Scenario: Imagine building a spam email detector. A single neuron could take inputs like:

Number of exclamation marks
Presence of words like "FREE" or "URGENT"
Email length
Sender reputation score

The neuron learns optimal weights for each feature to classify emails as spam or not spam.

2. Activation Functions: The Decision Makers

What are Activation Functions?

Activation functions are mathematical functions that determine whether a neuron should be activated (fire) or not. They introduce non-linearity into the network, enabling it to learn complex patterns that linear functions cannot capture.

Below are the top 10 System Design Case studies for this week

[Launching-ML System Design Tech Case Study Pulse #2] Million Of House Prices in Predicted Accurately in Real Time : How Zillow Actually Works

[ML System Design Tech Case Study Pulse #4 : Top Question] Predict Real-time Store Status to Billions of Users Worldwide: How Google Maps Actually Work

[ML System Design Tech Case Study Pulse #3 : Top Question] Recommending Million Of Items to Millions of Customer in Real Time: How Amazon Recommendation Actually Works

[Launching-ML System Design Tech Case Study Pulse #1]Handling Billions of Transaction Daily : How Amazon Efficiently Prevents Fraudulent Transactions (How it Actually Works)

Billions of Queries Daily : How Google Search Actually Works

100+ Million Requests per Second : How Amazon Shopping Cart Actually Works

Serving 132+ Million Users : Scaling for Global Transit Real Time Ride Sharing Market at Uber

3 Billion Daily Users : How Youtube Actually Scales

$100000 per BTC : How Bitcoin Actually Works

$320 Billion Crypto Transactions Volume: How Coinbase Actually Works

100K Events per Second : How Uber Real-Time Surge Pricing Actually Works

Processing 2 Billion Daily Queries : How Facebook Graph Search Actually Works

7 Trillion Messages Daily : Magic Behind LinkedIn Architecture and How It Actually Works

1 Billion Tweets Daily : Magic Behind Twitter Scaling and How It Actually Works

12 Million Daily Users: Inside Slack's Real-Time Messaging Magic and How it Actually Works

3 Billion Daily Users : How Youtube Actually Scales

1.5 Billion Swipes per Day : How Tinder Matching Actually Works

500+ Million Users Daily : How Instagram Stories Actually Work

2.9 Billion Daily Active Users : How Facebook News Feed Algorithm Actually Works

20 Billion Messages Daily: How Facebook Messenger Actually Works

8+ Billion Daily Views: How Facebook's Live Video Ranking Algorithm Works

How Discord's Real-Time Chat Scales to 200+ Million Users

80 Million Photos Daily : How Instagram Achieves Real Time Photo Sharing

Serving 1 Trillion Edges in Social Graph with 1ms Read Times : How Facebook TAO works

How Lyft Handles 2x Traffic Spikes during Peak Hours with Auto scaling Infrastructure..

Without activation functions, no matter how many layers you stack, your neural network would behave like a single linear function—severely limiting its power.

Most Important Activation Functions and how to use them

2.1 Sigmoid Function

σ(z) = 1 / (1 + e^(-z))

Range: (0, 1)
Shape: S-shaped curve
Properties: Smooth, differentiable, outputs probabilities

Ignito