Applied ML Notebook
Technical articles on applied ML and large-scale systems
Arbiter: Solving Head-of-Line Blocking in Production ML Inference Systems
A priority-aware request gateway that eliminates head-of-line blocking in high-throughput ML services while maintaining sub-10ms latency overhead and improving batch throughput by 40%. Open sourced at github.com/skpulipaka26/arbiter.
Demystifying Entropy, Cross-Entropy, and KL Divergence in Modern Machine Learning
Entropy, cross-entropy, and KL divergence are key tools for reasoning about uncertainty in ML. This article unpacks these concepts and their practical significance for model training and evaluation.
Enhancing Concurrent Traffic Handling in Managed ML Services Using Batching-Ring Buffers
Managed Machine Learning (ML) services frequently face significant challenges when handling high-concurrency traffic. As the scale of operations grows—potentially serving thousands or millions of simultaneous requests—traditional approaches of sequential request processing or simple queuing mechanisms can lead to performance bottlenecks. This article explores how batching-ring buffers can be implemented to efficiently handle concurrent requests in ML services.