Applied ML Notebook

Technical articles on applied ML and large-scale systems

Arbiter: Solving Head-of-Line Blocking in Production ML Inference Systems

August 24, 2025

A priority-aware request gateway that eliminates head-of-line blocking in high-throughput ML services while maintaining sub-10ms latency overhead and improving batch throughput by 40%. Open sourced at github.com/skpulipaka26/arbiter.

Demystifying Entropy, Cross-Entropy, and KL Divergence in Modern Machine Learning

June 11, 2025

Entropy, cross-entropy, and KL divergence are key tools for reasoning about uncertainty in ML. This article unpacks these concepts and their practical significance for model training and evaluation.

Enhancing Concurrent Traffic Handling in Managed ML Services Using Batching-Ring Buffers

June 5, 2025

Managed Machine Learning (ML) services frequently face significant challenges when handling high-concurrency traffic. As the scale of operations grows—potentially serving thousands or millions of simultaneous requests—traditional approaches of sequential request processing or simple queuing mechanisms can lead to performance bottlenecks. This article explores how batching-ring buffers can be implemented to efficiently handle concurrent requests in ML services.