How Google (probably) Leverage Old TPUs for cheap LLM Inference
Exploring how Google likely uses older, cheaper TPU generations for profitable LLM inference through fleet strategy, XLA/JAX, quantisation, model routing, and prefill/decode disaggregation.
Read more →