MUST know things for ML System Design

Jan 22, 2024

Hi All,

Previously we covered 200+ system design case studies. In the new segment; we are moving forward with - How to solve Any ML System Design Problem. As we develop new ML System Design Series with 300+ case studies, we will be covering below topics in detail —

Pre-requisite to starting ML System Design -

MUST Complete these data science and ML courses ( since ML system design case studies will involve lots of concepts covered in these courses)

Start here -

Day 1 of ML System Design Case Studies Series : ML System Design Basics

Chapter 1: Introduction and Overview
1. Data warehouse
2. Structured vs. unstructured data
3. Bagging technique in ensemble learning
4. Boosting technique in ensemble learning
5. Stacking technique in ensemble learning
6. Interpretability in Machine Learning
7. Traditional machine learning algorithms
8. Sampling strategies
9. Data splitting techniques
10. Class-balanced loss
11. Focal loss paper
12. Focal loss
13. Data parallelism
14. Model parallelism
15. Cross entropy loss
16. Mean squared error loss
17. Mean absolute error loss
18. Huber loss
19. L1 and L2 regularization
20. Entropy regularization
21. K-fold cross-validation
22. Dropout paper
23. Overview of optimization algorithm
24. Stochastic gradient descent
25. AdaGrad optimization algorithm
26. Momentum optimization algorithm
27. RMSProp optimization algorithm
28. ELU activation function
29. ReLU activation function
30. Tanh activation function
31. Sigmoid activation function
32. FID score
33. Inception score
34. BLEU metrics
35. METEOR metrics
36. ROUGE score
37. CIDEr score
38. SPICE score
39. Quantization-aware training
40. Model compression survey
41. Shadow deployment
42. A/B testing
43. Canary release
Chapter 2: Visual Search System
1. Visual search at Pinterest
2. Visual embeddings for search at Pinterest
3. Representation learning
4. ResNet paper
5. Transformer paper
6. Vision Transformer paper
7. SimCLR paper
8. MoCo paper
9. Contrastive representation learning methods
10. Dot product
11. Cosine similarity
12. Euclidean distance
13. Curse of dimensionality
14. Curse of dimensionality issues in ML
15. Cross-entropy loss
16. Vector quantization
17. Product quantization
18. R-Trees
19. KD-Tree
20. Annoy
21. Locality-sensitive hashing
22. Faiss library
23. ScaNN library
24. Content moderation with ML
25. Bias in image and recommendation systems
26. Positional bias
27. Smart crop
28. Better search with GNNs
29. Active learning
30. Human-in-the-loop ML
Chapter 3: Google Street View Blurring System
1. Google Street View
2. DETR
3. RCNN family
4. Fast R-CNN paper
5. Faster R-CNN paper
6. YOLO family
7. SSD
8. Data augmentation techniques
9. CNN
10. Object detection details
11. Forward pass and backward pass
12. MSE
13. Log loss
14. Pascal VOC
15. COCO dataset evaluation
16. Object detection evaluation
17. NMS
18. Pytorch implementation of NMS
19. Recent object detection models
20. Distributed training in TensorFlow
21. Distributed training in PyTorch
22. GDPR and ML
23. Bias and fairness in face detection
24. AI fairness
25. Continual learning
26. Active learning
27. Human-in-the-loop ML
Chapter 4: YouTube Video Search
1. Elasticsearch
2. Preprocessing text data
3. NFKD normalization
4. What is Tokenization summary
5. Hash collision
6. Deep learning for NLP
7. TF-IDF
8. Word2Vec models
9. Continuous bag of words
10. Skip-gram model
11. BERT model
12. GPT3 model
13. BLOOM model
14. Transformer implementation from scratch
15. 3D convolutions
16. Vision Transformer
17. Query understanding for search engines
18. Multimodal video representation learning
19. Multilingual language models
20. Near-duplicate video detection
21. Generalizable search relevance
22. Freshness in search and recommendation systems
23. Semantic product search by Amazon
24. Ranking relevance in Yahoo search
25. Semantic product search in E-Commerce
Chapter 5: Harmful Content Detection
1. Facebook’s inauthentic behavior
2. LinkedIn’s professional community policies
3. Twitter’s civic integrity policy
4. Facebook’s integrity survey
5. Pinterest’s violation detection system
6. Abusive detection at LinkedIn
7. WPIE method
8. BERT paper
9. Multilingual DistilBERT
10. Multilingual language models
11. CLIP model
12. SimCLR paper
13. VideoMoCo paper
14. Hyperparameter tuning
15. Overfitting
16. Focal loss
17. Gradient blending in multimodal systems
18. ROC curve vs precision-recall curve
19. Introduced bias by human labeling
20. Facebook’s approach to quickly tackling trending harmful content
21. Facebook’s TIES approach
22. Temporal interaction embedding
23. Building and scaling human review system
24. Abusive account detection framework
25. Borderline contents
26. Efficient harmful content detection
27. Linear Transformer paper
28. Efficient AI models to detect hate speech
Chapter 6: Video Recommendation System
1. YouTube recommendation system
2. DNN for YouTube recommendation
3. CBOW paper
4. BERT paper
5. Matrix factorization
6. Stochastic gradient descent
7. WALS optimization
8. Instagram multi-stage recommendation system
9. Exploration and exploitation trade-offs
10. Bias in AI and recommendation systems
11. Ethical concerns in recommendation systems
12. Seasonality in recommendation systems
13. A multitask ranking system
14. Benefit from negative feedback
Chapter 7: Event Recommendation System
1. Learning to rank methods
2. RankNet paper
3. LambdaRank paper
4. LambdaMART paper
5. SoftRank paper
6. ListNet paper
7. AdaRank paper
8. Batch processing vs stream processing
9. Leveraging location data in ML systems
10. Logistic regression
11. Decision tree
12. Random forests
13. Bias/variance trade-off
14. AdaBoost
15. XGBoost
16. Gradient boosting
17. XGBoost in Kaggle competitions
18. GBDT
19. An introduction to GBDT
20. Introduction to neural networks
21. Bias issues and solutions in recommendation systems
22. Feature crossing to encode non-linearity
23. Freshness and diversity in recommendation systems
24. Privacy and security in ML
25. Two-sides marketplace unique challenges
26. Data leakage
27. Online training frequency
Chapter 8: Ad Click Prediction on Social Platforms
1. Addressing delayed feedback
2. AdTech basics
3. SimCLR paper
4. Feature crossing
5. Feature extraction with GBDT
6. DCN paper
7. DCN V2 paper
8. Microsoft’s deep crossing network paper
9. Factorization Machines
10. Deep Factorization Machines
11. Kaggle’s winning solution in ad click prediction
12. Data leakage in ML systems
13. Time-based dataset splitting
14. Model calibration
15. Field-aware Factorization Machines
16. Catastrophic forgetting problem in continual learning
Chapter 9: Similar Listings on Vacation Rental Platforms
1. Instagram’s Explore recommender system
2. Listing embeddings in search ranking
3. Word2vec
4. Negative sampling in recommendation systems
5. Airbnb’s content-based recommendations
6. Instagram’s hybrid recommendation system
7. Diversity in recommendation systems
8. TF-IDF
9. Learning to rank
10. Balancing user satisfaction and business objectives
11. Adversarial training
12. Adversarial training in recommendation systems
13. Fairness in recommendation systems
14. Fairness in machine learning
15. A/B testing in recommendation systems

Projects Videos —

Subscribe today!

Ignito Youtube Channel

Subscribe and Start today!!

Github : https://bit.ly/3jFzW01

Learn how to efficiently use Python Built-in Data Structures

Let’s get started with new system design case studies-

More ML system design case studies coming soon! Follow - Link

Thanks for reading Ignito! Subscribe for free to receive new posts and support my work.

Thanks,

Team Ignito

Ignito

MUST know things for ML System Design

Pre-requisite to starting ML System Design -

Discussion about this post