reflections on building in AI. written when something is worth saying.
the metrics that made sense for web apps are being applied to AI systems that work completely differently. that mismatch is hiding a lot of problems.
when you remove the API layer and run things yourself, you see things you were previously just trusting someone else to handle. some of it is reassuring. some of it isn't.
a model can top every leaderboard and still be frustrating to work with day to day. i've been thinking about why that gap exists and whether it's getting smaller.
some of the most interesting things i've built started as a feeling that something was wrong or missing — not a clear idea. i'm trying to get better at following that instinct earlier.
the way i learn anything is to build something with it. i don't think there's a faster path to actually understanding. the hard part is getting comfortable being wrong in public while you're doing it.