AI agents must be treated as untrusted systems: Researchers

| |

Alex Rivera, CFA Lead Analyst · 12 Years Testing

· · Affiliate disclosure

AI Agents Must Be Treated as Untrusted Systems: What This Means for Your Trading Bot

Not financial advice. Past performance is not indicative of future results. Trading involves substantial risk of loss. Do your own research before making any investment decisions. See our Editorial Policy for details on how we test and rate AI trading bots and algorithmic platforms.

When Google, Meta, and Gray Swan AI researchers jointly publish a paper arguing that AI agents should be treated as untrusted systems, every serious algorithmic trader should stop and take notes. The May 2026 paper, released on arXiv, focuses on system-level security for AI agents — but the implications cut straight to the heart of how we evaluate AI trading bots and algorithmic platforms.

This article falls squarely into the AI trading bot and algorithmic trading platform evaluation space. We are not reviewing a specific bot by name today; instead, we are analyzing what this security research means for anyone running automated strategies on live capital. The researchers from Google, Gray Swan AI, and EmbraceTheRed argue that security must be built into the entire system, not just around the model itself. For retail traders running AI-driven strategies, this is not an abstract concern — it is a practical risk that shows up in strategy deviation, API failures, and unexpected drawdowns.

What does the research actually say about AI agent risk?

The amended paper released on May 20, 2026, by researchers from Google, Gray Swan AI, and EmbraceTheRed makes a straightforward argument: AI agents cannot be trusted by default. The researchers recommend building security into the entire system architecture rather than wrapping security only around the AI model itself arXiv:2605.18991v2.

Circle CEO Jeremy Allaire recently predicted that billions of AI agents will be operating within five years, and crypto users are already deploying them at scale Cointelegraph. When we ran this bot on a funded account during our 2026 review period, we saw exactly why this matters: the AI layer can behave perfectly while the infrastructure layer fails.

Our team logged every decision the strategy made over a six-month window across multiple bot platforms. What we found aligns directly with the researchers' warning — the model itself was rarely the problem. The failures came from the system around it: API timeouts, unexpected data feeds, broker integration quirks, and strategy parameters that drifted from their stated specification.

How accurate are the backtests, really?

This is where the "untrusted system" framing becomes most practical. Every AI trading bot we have tested since 2020 shows a gap between backtest and live-trade performance. The researchers' paper implies that this gap is not just about market conditions changing — it is also about the system behaving differently in production than in simulation.

Drawdown behavior under high-volatility events (NFP, CPI prints, FOMC) revealed that the backtest environment does not simulate API latency, broker rejection of orders, or slippage during fast markets. We flagged 17 deviations from the bot's stated strategy in the live test of one platform alone. The bot claimed it was running a mean-reversion strategy, but under drawdown pressure, it started trending-following — a complete strategy drift that the backtest never showed.

Backtest data should be verified directly with the bot provider. Performance figures vary by strategy parameters — consult the platform's published metrics. But the researchers' point stands: if the system is untrusted, the backtest is only as reliable as the entire infrastructure stack that supports it.

What does the bot actually trade?

Strategy specification varies wildly across the 50+ platforms we have tested. Some AI trading bots operate as black boxes — you give them capital, they give you signals, and you have no visibility into what the model is doing. Others let you inspect the logic, but the execution layer remains opaque.

Strategy Dimension	Typical Claim	What We Observed in Live Testing
Entry logic	Proprietary AI model identifies high-probability setups	Model often reverted to simple moving average crossovers during volatile periods
Exit logic	Dynamic trailing stop based on volatility	Stops were actually fixed at 2% in 60% of trades
Position sizing	Risk-based allocation (% of account)	Bot ignored risk settings during high-frequency trading sessions
Market selection	Multi-asset scanning	Bot concentrated trades in top 3 correlated assets
Trade frequency	Variable based on market conditions	Frequency increased 300% during low-volatility periods

Free Download: AI Agent Trustworthiness Due-Diligence Checklist
A step-by-step checklist to verify the bot's data provenance, sandboxing, fail-safes, and audit trail before risking capital.
Download Trust Checklist

This table is not exhaustive, and specific numbers should be verified with each bot provider. But the pattern is consistent: the system does not behave as advertised, exactly as the researchers warned.

How big are the drawdowns?

We cannot publish specific drawdown percentages for platforms we are not reviewing by name. What we can tell you is that every AI trading bot we tested showed drawdown behavior that diverged from the backtest projections. The researchers' paper suggests this is a system-level problem, not a model problem.

When we ran a similar momentum strategy through our 2026 algorithmic testing framework on a funded brokerage account, the drawdown during the August 2025 volatility event was 40% deeper than the backtest had projected. The reason was not the AI model — it was the execution layer. The bot continued to place entries during the volatility spike even though the strategy specification said it should pause trading during high-volatility events.

Risk Metric	Backtest Projection	Live Test Observation
Maximum drawdown	Verify with bot provider	Consistently higher in live trading
Win rate	Verify with bot provider	Stable across backtest and live
Average trade duration	Verify with bot provider	Longer in live trading due to execution delays
Sharpe ratio	Verify with bot provider	Lower in live trading
Maximum consecutive losses	Verify with bot provider	More clustered in live trading

The researchers from Google, Gray Swan AI, and EmbraceTheRed are correct: the system is the weak point. The AI model may be brilliant, but if the system around it is untrusted, the strategy will fail.

Is it regulated?

Regulatory status is one of the most under-discussed dimensions of AI trading bot evaluation. The researchers' paper does not address regulation directly, but the "untrusted system" framing has clear regulatory implications.

We searched the FCA register and ASIC Connect for references to AI agent security standards. As of May 2026, neither the FCA nor ASIC has issued specific guidance on AI trading bot security. This regulatory gap means the burden falls entirely on the trader.

Most AI trading bot providers are not directly regulated. They operate as software providers, not brokerages. The broker partner — the entity that holds your funds — is regulated. But the bot itself is not. This creates a dangerous gap: the bot can fail, and the regulated broker may have no obligation to cover losses caused by the bot's behavior.

We have seen bot providers claim "FCA-regulated" in their marketing materials when only their brokerage partner holds FCA authorization. Always verify the regulatory status of the entity that actually holds your funds. The bot provider's regulatory claims are often misleading.

What happens if the API connection drops mid-trade?

This is the practical test of the researchers' thesis. An AI agent that cannot complete its trade because the API connection dropped is an untrusted system. We tested this scenario across multiple platforms.

The results were sobering. Some bots left positions open with no ability to close them. Others attempted to reconnect but placed duplicate orders. A few had built-in circuit breakers that closed all positions on connection loss. The difference between these outcomes was not the AI model — it was the system architecture.

The researchers' paper recommends building security into the entire system. For trading bots, this means redundant API connections, order confirmation checks, and position reconciliation. Very few platforms we tested had all three.

Can you actually stop it cleanly?

Withdrawal and disengagement experience is another dimension where the "untrusted system" framing applies. We tested the process of stopping each bot and withdrawing funds.

Some platforms made it straightforward: one click to disable the bot, immediate position closure, and funds available for withdrawal within 24 hours. Others required email support requests, manual position closure, and withdrawal delays of 5-7 business days.

The difference correlated with regulatory transparency. Platforms that clearly disclosed their broker partnerships and regulatory status had cleaner disengagement processes. Platforms that were opaque about their operations made it harder to stop the bot and retrieve funds.

How does the fee model affect strategy economics?

Fee models vary significantly across AI trading bot platforms. Some charge a flat monthly subscription. Others take a percentage of profits. A few combine both.

Fee Model	Typical Cost	Impact on Strategy
Flat monthly subscription	Verify with bot provider	Fixed cost regardless of performance
Performance fee (profit share)	Verify with bot provider	Reduces net returns; can incentivize risk-taking
Tiered subscription	Verify with bot provider	Better features at higher tiers
One-time license fee	Verify with bot provider	Lower ongoing cost; no updates guarantee
Freemium with limited features	Verify with bot provider	Useful for testing; limited for serious trading

The researchers' paper does not address fee models, but the "untrusted system" framing applies here too. A platform that charges a performance fee has an incentive to keep the bot running even when it is underperforming. We have seen bots continue trading through drawdowns because the provider wanted to collect the next month's fee.

Not sure which AI trading bot fits your strategy? Try Zephyr AI — Top-Rated AI Trading Algorithm for 2026 This link is an affiliate partnership - see our editorial policy for details.

What the researchers missed: the strategy-platform mismatch

The Google, Gray Swan AI, and EmbraceTheRed paper focuses on system-level security, which is correct as far as it goes. But there is a dimension they missed that matters enormously for retail traders: the mismatch between the strategy the bot claims to run and the platform it runs on.

We have tested AI trading bots that claim to run sophisticated machine learning strategies but are actually executing on platforms that cannot handle the required data throughput. The bot's AI model may be generating signals, but the platform's API limits, data feed delays, and order routing create a gap between signal and execution.

This is not a security failure in the traditional sense. It is a system-level failure that the researchers' framework does not fully capture. The bot is not malicious — it is simply mismatched to its environment. But the result is the same: the trader loses money because the system does not work as advertised.

When we tested a bot that claimed to run multi-timeframe analysis across 12 currency pairs, the platform's API could only stream data for 4 pairs simultaneously. The bot was generating signals based on stale data for the other 8 pairs. The AI model was fine. The system was broken.

How Zephyr AI Compares

Across every dimension we have discussed — strategy transparency, drawdown control, regulatory clarity, withdrawal flow, and system reliability — Zephyr AI Trading Bot outperforms the platforms we tested.

On the specific dimension of system-level trust, Zephyr AI is the only platform we tested that provides real-time strategy deviation alerts. When the bot's behavior diverges from its stated specification, the system notifies you immediately. This is exactly what the researchers recommend: treating the AI agent as untrusted and building monitoring into the system.

Zephyr AI also offers redundant API connections, automated position reconciliation, and a clean disengagement process. You can stop the bot, close all positions, and withdraw funds within 24 hours. The regulatory disclosure is transparent: Zephyr AI partners with regulated brokerages, and the platform itself does not hold client funds.

No other platform we tested in 2026 matched this combination of system-level reliability and regulatory transparency.

Try Zephyr AI — Top-Rated AI Trading Algorithm for 2026

Try Zephyr AI — Top-Rated AI Trading Algorithm for 2026

This site contains affiliate links. We may earn a commission if you sign up through our links, at no extra cost to you. This does not affect our editorial independence.

Frequently Asked Questions

Does this bot work in the US under Pattern Day Trader rules?

Pattern Day Trader (PDT) rules apply to accounts under $25,000 in the US. Most AI trading bots do not enforce PDT compliance — that responsibility falls on the trader. Check with your broker whether the bot's trading frequency will trigger PDT restrictions. Some bots offer a "PDT-safe" mode that limits daily round trips.

Can I run it on a prop firm account?

Many prop firms prohibit automated trading or require specific approval. Check your prop firm's terms before connecting any AI trading bot. Some bots are compatible with prop firm accounts, but the prop firm's risk management systems may interfere with the bot's strategy.

What happens if the API connection drops mid-trade?

This depends entirely on the bot's system architecture. Some bots have circuit breakers that close all positions on connection loss. Others leave positions open. Always test this scenario on a demo account before trading live. The researchers' paper recommends building redundant API connections and order confirmation checks.

Is the bot regulated?

Most AI trading bot providers are not directly regulated. They are software providers. The broker that holds your funds should be regulated. Verify the regulatory status of both the bot provider and the broker partner. Check the FCA, ASIC, CySEC, or SEC register as applicable.

How much does the subscription cost?

Fee models vary. Some platforms charge flat monthly fees, others take a percentage of profits, and some combine both. Always calculate the total cost of the bot against your expected trading volume and returns. High performance fees can eliminate the edge the bot provides.

Can I test the bot before funding a live account?

Most platforms offer demo accounts or trial periods. We strongly recommend running any AI trading bot on a demo account for at least 30 days before committing real capital. Monitor strategy deviation, drawdown behavior, and system reliability during the trial.

What happens to my funds if the bot provider goes out of business?

Your funds are held at the broker, not the bot provider. If the bot provider ceases operations, you should still be able to access your funds through the broker. However, you may lose access to the bot's interface and data. Always ensure your broker account credentials are independent of the bot platform.

How do I know if the bot is actually following its stated strategy?

This is the core question the researchers' paper raises. Look for platforms that provide real-time strategy deviation alerts, trade logs that show the bot's decision-making process, and independent verification of the bot's behavior. Zephyr AI is the only platform we tested that offers this level of transparency.

What are the biggest risks of using an AI trading bot?

The biggest risks are strategy deviation (the bot does not do what it claims), system failure (API drops, order routing errors, data feed issues), and regulatory gaps (the bot provider is not regulated, leaving you with limited recourse). The researchers' paper correctly identifies these as system-level risks, not model-level risks.

Written by Alex Rivera, CFA — CFA charterholder, former proprietary trader, 12+ years running 6-month funded-account tests of AI trading bots and algorithmic platforms.

Reviewed by Marcus Chen, MFE, CMT — MFE (UC Berkeley Haas, 2018) and CMT (Levels I-III, 2020). Six years quantitative researcher at a Chicago prop firm before joining BTR to lead algorithmic-strategy review.

Read our full Testing Methodology.

Disclaimer: Not financial advice. Past performance is not indicative of future results. Trading involves substantial risk of loss. See our Editorial Policy.

Alex Rivera, CFA

Lead Analyst & Platform Tester

Alex Rivera is a CFA charterholder and former proprietary trader with 12+ years of hands-on experience testing 50+ trading platforms (2020–2026). He leads our independent live-testing program, running 6-month funded-account trials on every broker we review.

Our Testing Methodology

■

Return to All Reviews