New ask Hacker News story: Ask HN: What's your biggest LLM cost multiplier?
Ask HN: What's your biggest LLM cost multiplier?
4 by teilom | 2 comments on Hacker News.
"Tokens per request" has been a misleading cost model for us in production. The real drivers seem to be multipliers: retries/429s, tool fanout, P95 context growth, and safety passes. What’s been the biggest cost multiplier in your prod LLM systems, and what policies worked (caps, degraded mode, fallback, hard fail)?
4 by teilom | 2 comments on Hacker News.
"Tokens per request" has been a misleading cost model for us in production. The real drivers seem to be multipliers: retries/429s, tool fanout, P95 context growth, and safety passes. What’s been the biggest cost multiplier in your prod LLM systems, and what policies worked (caps, degraded mode, fallback, hard fail)?
Comments
Post a Comment