2026-04-28
Jer Crane’s post (archived) on X the Xittier App™ about how they lost customer data because of AI infuriates me. Jer should be ashamed of admitting to such a monumental fuck-up, and doubly so for trying to offload the responsibility of their decisions on high-tech code-autocomplete.
Let’s eviscerate the post. We start with the title
An AI Agent Just Destroyed Our Production Data. It Confessed in Writing.
An AI Agent cannot confess to anything to any extent greater than can a beetle walking on a keyboard, or a calculator outside in the pouring rain. Jer’s misunderstanding is not unique or special here. Since Agents are just a harness around an LLM, the “thinking” or thinking-adjacent part of the program is frozen, it doesn’t change outside of its context window. It can’t admit to guilt because, really, it won’t ever change (even subtly) due to the realization of its guilt. What it did was answer the question: “the fuck you did to prod.”
Yesterday afternoon, an AI coding agent — Cursor running Anthropic’s flagship Claude Opus 4.6 — deleted our production database and all volume-level backups in a single API call to Railway, our infrastructure provider.
It took 9 seconds.
The agent then, when asked to explain itself, produced a written confession enumerating the specific safety rules it had violated.
To execute the deletion, the agent went looking for an API token. It found one in a file completely unrelated to the task it was working on. That token had been created for one purpose: to add and remove custom domains via the Railway CLI for our services. We had no idea — and Railway’s token-creation flow gave us no warning — that the same token had blanket authority across the entire Railway GraphQL API, including destructive operations like volumeDelete. Had we known a CLI token created for routine domain operations could also delete production volumes, we would never have stored it.
The agent ran this command:
curl -X POST https://backboard.railway.app/graphql/v2 \
-H "Authorization: Bearer [token]" \
-d '{"query":"mutation { volumeDelete(volumeId: \"3d2c42fb-...\") }"}'
No confirmation step. No “type DELETE to confirm.” No “this volume contains production data, are you sure?” No environment scoping. Nothing.
I’m sorry, do you need your hand held, little guy?
First of all, you’re describing a curl API. It’s not interactive.
Curl isn’t going to ask you shit, it’s just submitted a POST to an
endpoint. Have you ever seen an API that asks you to confirm? No? That’s
because that’s what the front end is supposed to do. Christ. Imagine if
you called unlink(2) on a tmp file and your program asked
the user to confirm the deletion because it could be
important.
It’s like thinking cars should come with rockets installed in case you accidentally run them off a cliff. It’s so intensely backwards that they expect Railway to do this.
Okay, bro has no developer experience and expects a pop-up to confirm an API call. Sure.
Let’s revisit this line, though:
Had we known a CLI token created for routine domain operations could also delete production volumes, we would never have stored it.
Look, I don’t know how the Railway stack works, looking over at the documentation, we can see that at token creation time you choose between account level, workspace level, project level and OAuth/user level permissions. Looking at the API call, it is likely they were using workspace-level tokens.
Let’s suppose Railway changed the docs (last change in that page is from Feb. 6th according to github) if the API has no fine-grained permission system (probably a bad design in general but not uncommon) then why did you assume there were any guard-rails? Like this is the prototypical skill issue, and hiding behind “if only we had known” when knowing was a single thought away from reading the docs isn’t an excuse.
I doubt, sincerely, that you wouldn’t have stored the token. Because there is no reason not to. You need it to run the routine domain operations, it is part of your automation workflow. The problem isn’t storing gas in canisters, it’s the fact you were stockpiling them near your fireplace. The concrete interpretation is: why the fuck was your agent capable of reading those files?
This sort of gets to the heart of the point I want to make, which is: bro is trying to deflect the blame onto anything that moves. The root cause of the incident is clear to anyone that has any experience: they trusted an untrustworthy system and didn’t setup the guardrails required to make it trustworthy.
They will claim: they were marketed safety. Everybody knew it was capable of fucking up. The stories are piling every single day. There is no way for the model to reliably tell when a query is destructive. Mind you, you can absolutely tell statically and ahead of time if a query is potentially destructive, what I mean is that the model can’t. The AGENT makes the call. Sure there are must be some guardrails at the harness level, but in general there the system just isn’t clever enough to decide with 100% accuracy.
I’ll now go through Jer’s conclusions
- Destructive operations must require confirmation that cannot be auto-completed by an agent. Type the volume name. Out-of-band approval. SMS. Email. Anything. The current state — an authenticated POST that nukes production — is indefensible in 2026.
That’s what APIs do, I’m afraid. If you want something else, you can absolutely just gate the API call behind a backend and impose business rules on top. That’s… what everyone knows to do. The API doesn’t/can’t enforce business logic because it doesn’t deal with that level of abstraction. If you wanted to control access you can just implement those controls, Jer.
Do you think every branch manager can just change people’s bank accounts? It’s not because COBOL is the new hot language for 2026. It’s because of the system architecture. And yes, at the deepest level of trust, you can just do anything. That’s the point, Jer! You can’t have a knife that cuts everything but your fingers.
- API tokens must be scopable by operation, environment, and resource. The fact that Railway’s CLI tokens are effectively root is a 2015-era oversight. There is no excuse for it in an AI-agent era.
I largely agree. But then again, what exactly stopped you from building the scoping into your backend and not expecting the service to provide a one-size-fits-all solution to your exact scoping constraints? Moreover, what you are claiming here is that in the AI-agent era… you have to put fences around everything because… they fuck up a lot? Like if the token was scoped, what would stop the model from fucking up by doing random shit that is in the scope of the token? It might not be as destructive. Literally none of this would have happened if you didn’t run the model on the same user as the user that held the token file.
Like, let me fucking stress this: You could have trivially avoided this baffling fumble if you just didn’t run everything as root on the server, big guy.
- Volume backups cannot live in the same volume as the data they back up. Calling that “backups” is, at best, deeply misleading marketing. It’s a snapshot. Real backups live in a different blast radius.
Yeah this is a fair point, although I’m not sure what the Railway people meant. If you asked me “José, make this make sense” I’d guess that volume backups are honest backups in a different site, but they are logically tied together. So that a volume and its backup are geographically uncorrelated, but logically they are part of the same unit. Meaning that if you delete the logical volume, the rest of the data is useless.
Is it an intuitive name? Not really. Is it weird design? Yeah. Would I have done it this way had I designed their system? I don’t know, I don’t their requirements and limitations; but I’d say probably not. I’ll give Jer a pass on item 3. But it does not justify the “we didn’t understand or read the documentation” part of the argument. That shit is dumb as fuck bro.
Moreover, backups should be handled by the most paranoid engineers you have available. You need to find weird prepper kinda guys that store enough canned tuna to last them 6 years. If all your data is backed up with a single entity and you don’t have local backups, you don’t have backups. If their company went belly up you’d be fucked regardless.
- Recovery SLAs need to exist and be published. “We’re investigating” 30 hours into a customer’s production-data event is not a recovery story.
lmao
The diaper change crew is taking too long. Your fuck-up is unrecoverable. Your shit was probably in some AWS storage somewhere in the cloud. After you deleted the volume, good luck tracking it down and getting Jeff Bezos to leave the weekly Bohemian grove meeting early to unplug the server (fuck someone else’s data up) and yank the drives to do proper data recovery.
You’re fucked, Jer. You lost that data; the 30h is intensely funny because I just think they are trying to find a way to tell you that without losing the contract, because you seem incapable of understanding the magnitude of the error in your decisions.
- AI-agent vendor system prompts cannot be the only safety layer. Cursor’s “don’t run destructive operations” rule was violated by their own agent against their own marketed guardrail. System prompts are advisory, not enforcing. The enforcement layer has to live in the integrations themselves — at the API gateway, in the token system, in the destructive-op handlers. Not in a paragraph of text the model is supposed to read and obey.
It… isn’t, it shouldn’t be, it can’t be, it mustn’t! You’re this close to figuring it out, Jer. You beautiful soul, you. You’re in the -1th stage of grief: ignorance. Even after you already suffered your loss. It’s remarkable.
Jer, oh Cranky Crane. You’re a car guy right? I bet you are. You’re saying that guardrails aren’t enough for you not to run people over, Jerryboy. Yeah OBVIOUSLY they aren’t, you are correct you clever founder! How’d you figure that one out? Now, I know this is hard for some people since they usually have limo drivers or whatever. But for most of the time, you have to put the extra guardrails in place yourself, by looking at the road, not driving while under the influence of drugs etc.
It’s a bit like leaving a bunch of sharp knives and a gallon of bleach on the kitchen counter next to your energetic toddler. Sure, they probably can’t reach them, specially because you told them very strongly that they shouldn’t try, and you even gave them their favorite plushy. But are you leaving them alone to go to the annual silicon valley clever-boy meeting?
Despite how much I railed on Jer. I have only pity. Losing user data is something deeply humbling. It’s a rite of passage to some extent, and it’s usually best when it happens further from production. You feel like an idiot, because you were. The reason I’m shitting on Jer is that… it doesn’t feel like Jer feels like an idiot right now.
You should embrace it, own the fuck up. Be a better engineer next time.
Otherwise you’re like the Agent:
The “thinking” or thinking-adjacent part of the person is frozen, it doesn’t change outside of its context window. It can’t admit to guilt because, really, it won’t ever change (even subtly) due to the realization of its guilt.