Researchers Propose New Way to Manage Financial Risk When AI Agents Fumble Trades

2 hours ago 2

In brief

A newly proposed “Agentic Risk Standard” separates AI jobs into fee-only tasks protected by escrow and fund-handling tasks that require underwriting.
In simulations, underwriting reduced user losses by up to 61%, though zero-loading premiums left underwriters insolvent.
Accurate failure-rate estimates remain the main challenge as both over- and underestimation create systemic risks.

As AI agents begin to handle payments, financial trades, and other transactions, there’s growing concern over the financial risks that fall on the human behind the agent when those systems fail. A consortium of researchers argues that current AI safety techniques do not address that risk, and new insurance-style techniques need to be considered.

In a recent paper, researchers from Microsoft, Google DeepMind, Columbia University, and startups Virtuals Protocol and t54.ai proposed the Agentic Risk Standard, a settlement-layer framework designed to compensate users when an AI agent misexecutes a task, fails to deliver a service, or causes financial loss.

“Technical safeguards can offer only probabilistic reliability, whereas users in high-stakes settings often require enforceable guarantees over outcomes,” the paper said.

The authors argue that most current AI research focuses on improving how models behave, including reducing bias, making systems harder to manipulate, and making their decisions easier to understand.

“These risks are fundamentally product-level and cannot be eliminated by technical safeguards alone because agent behavior is inherently stochastic,” they wrote. “To address this gap between model-level reliability and user-facing assurance, we propose a complementary framework based on risk management.”

The Agentic Risk Standard adds financial safeguards to how AI jobs are handled. For simple tasks where the user only risks paying a service fee, payment is held in escrow and released only after the work is confirmed. For higher-risk tasks that require releasing money upfront, such as trading or currency exchanges, the system brings in an underwriter. The underwriter evaluates the risk, requires the service provider to post collateral, and repays the user if a covered failure happens.

The paper noted that non-financial harms such as hallucination, defamation, or psychological harm remain outside the framework.

The researchers said the system was tested using a simulation that ran 5,000 trials, adding that the experiment was limited and not designed to reflect real-world failure rates.

“These results motivate future work on risk modeling for diverse failure modes, empirical measurement of failure frequencies under deployment-like conditions, and the design of underwriting and collateral schedules that remain robust under detector error and strategic behavior,” the study said.