what running LLMs locally taught me about abstraction

apr 2026

i spent a few weeks running models locally. not because i needed to — API costs were fine, latency was acceptable — but because i wanted to understand what was actually happening underneath the convenience layer.

the first thing you notice is how much the API is doing for you that you’d never thought about. token counting, context management, error handling, rate limiting — all abstracted away into clean request/response cycles. when you run locally, you’re suddenly holding all of that yourself.

the second thing you notice is how much the infrastructure assumptions change. memory becomes a real constraint in a way it never was when the model was someone else’s problem. you start thinking about quantisation, about batching, about whether you actually need all those parameters or whether a smaller model would do fine for your use case.

but the most interesting part wasn’t the infrastructure. it was what happens to your relationship with the outputs when you’re closer to the model. when you’re using an API, there’s a temptation to treat the output as something that came from somewhere sophisticated and unknowable. when you’re running the weights yourself, the outputs feel more mechanical. more understandable. you stop anthropomorphising and start seeing patterns.

i’m not saying everyone needs to run models locally. the convenience of APIs is real and worth it most of the time. but there’s value in doing it at least once — not just for the technical understanding, but for the epistemic shift it forces. you stop trusting the abstraction and start understanding it.

← back to memos