Can we actually trust an LLM agent with a bash shell and a corporate credit card? Yes, but only if we stop treating the system prompt like a legally binding contract.
The current state of “AI safety” is largely a joke. We tell an agent to “be helpful and harmless,” and then we act shocked when it deletes a production S3 bucket because it thought it was performing a “cleanup” task. The problem is that system prompts are just suggestions; they are probabilistic weights, not hard constraints. You can spend a hundred hours crafting the perfect prompt to prevent an agent from accessing a specific directory, and the model will still find a way to hallucinate a path around it or simply ignore the instruction when the context window gets crowded. We are basically trying to secure a data center by putting a “Please Do Not Enter” sign on the front door.
The paper on Deontic Policies argues that we need a runtime layer that doesn’t care about the agent’s “intent” or its “reasoning” but cares exclusively about the actual action. It is the difference between asking a teenager to be careful with the family car and installing a mechanical governor that caps the speed at 55 mph. One is a request for good behavior; the other is a physical impossibility of failure. By using deontic logic—the formal study of what is permitted and obligatory—this framework creates a formal gatekeeper that sits between the LLM and the tools it invokes. If the model tries to execute a rm -rf / on a restricted path, the governance layer kills the process before the command ever reaches the shell.
Of course, this isn’t a free lunch. (I suspect the operational overhead will be a nightmare). If every single tool call has to pass through a logic checker, we are introducing latency that will make “real-time” agents feel like they are thinking through a straw. There is a real-world friction here: every millisecond spent in the governance proxy is a millisecond the user is staring at a blinking cursor. It is essentially adding a corporate compliance officer to the loop who has to sign off on every single expense report before the card is swiped. If the logic engine is slow, the entire agentic workflow collapses into a series of stuttering pauses.
The real friction, however, will be the policy definition. Who actually writes these deontic rules? If the rules are too strict, the agent becomes a glorified paperweight; if they are too loose, you are right back where you started. We have seen this movie before with early RBAC systems in cloud infrastructure—everyone eventually just gave the admin role to the service account because they couldn’t figure out the granular permissions. Despite this, the demand for “hard” guardrails is peaking. By Q4, we will see the first enterprise-grade “governance proxy” emerge as a standalone SaaS product, probably marketed as an AI Firewall.
Moving toward formal runtime governance is the only way out of the current mess. Prompt engineering for safety is a game of whack-a-mole that we are losing. We need a hard-coded circuit breaker that lives outside the LLM’s weights and doesn’t suffer from the same probabilistic drift as the model itself. If the governance layer is decoupled from the model, we can actually audit why a request was blocked without wondering if the agent just had a hallucination about its own rules. We stop asking the model to police itself and instead build a cage that actually holds.
Formal logic is a boring solution, but it is the only one that actually works.