New ask Hacker News story: Ask HN: What are your worst war stories bringing agentic applications into prod
Ask HN: What are your worst war stories bringing agentic applications into prod 3 by yaoke259 | 0 comments on Hacker News. For a bit of context, I’m currently creating a team of AI agents at work to generate reports by fanning out into a large amount of subagents to process a large amount of transcript data. When the analysis fails mid-way because of some individual step like an API call returns an error or the machine is out of memory, it would create cascading errors that break the entire generation with almost no visibility. I’ve just spent the past month rewriting the individual jobs as durable execution jobs on DBOS but just wondering if there are better solutions out there and if others encountered similar issues? And then there is the issue to reflect back the progress to the users which I’ve just been coding ad-hoc honestly… When an agent fails at step 9 of 12, how do you handle that? Roughly how many engineer-weeks have you sunk into agent infrastructure (durability, monito...