A step-by-step inside-the-box explanation of what is cached, what is not, and why reuse is limited. Especially relevant for multiple questions over the same large document.
Language at the edge, logic at the core
I write deep technical essays about how large language models behave in real systems, with a focus on observability, reliability, and the boundary between narrative and proof. The goal is simple: keep the magic, remove the nonsense.
Most articles are intentionally long. Each entry links to a dedicated page.
Newest first
A step-by-step inside-the-box explanation of what is cached, what is not, and why reuse is limited. Especially relevant for multiple questions over the same large document.
A practical argument for why LLMs are exceptional at explanation but unreliable as causal engines. Includes concrete failure modes: arithmetic, long context decay, prompt bias, and the cost of fake reasoning.
A step-by-step inside-the-box explanation of how meaning emerges from training, why dimensions have no names, and why rotated spaces still think the same.