Blog

Thoughts on ML Ops, ML Engineering, and Software Development

Serving LLMs and Batch Inference at Scale - Adopting Ray at Tripadvisor

April 17, 2026

A writeup of my Ray Summit 2025 talk — how we integrated Ray at Tripadvisor to serve LLM inference alongside our batch ML, training and feature engineering pipelines.

How Google (probably) Leverage Old TPUs for cheap LLM Inference

April 8, 2026

Exploring how Google likely uses older, cheaper TPU generations for profitable LLM inference through fleet strategy, XLA/JAX, quantisation, model routing, and prefill/decode disaggregation.

LLM Inference Routing & Optimisation: Part 1 - Prefill & Decoding Disaggregation

April 6, 2026

LLM Inference Routing & Optimisation: Part 1 - Prefill & Decoding Disaggregation

From MapReduce to Real-Time ML: The Streaming Revolution in Feature Platforms

March 11, 2026

How the evolution from batch to streaming systems reshaped ML feature platforms, and why streaming-first is the future. Part 1 of a series on building Thyme.

Fun Apps with FastHTML - Part 1

September 23, 2024

Building fun web applications with FastHTML

LLM Inference Optimisation - Continuous Batching and vLLM

August 26, 2024

LLM Inference Optimisation - Continuous Batching and vLLM

Flyte's FlytePropeller and Go Worker Pools

July 13, 2024

A dive into the FlytePropeller architecture and Go Worker Pools

Meta LLM Compiler - Foundation models of compiler optimisation

July 4, 2024

Some PoC code for using Meta's LLM compiler optimisation models

Ask the expert - MLOps

June 25, 2024

Adlib's blog post on technical specialties.

MLOps at Ultraleap

June 20, 2024

How we built our new MLOps platform at Ultraleap