-
03
Apr 2026The biggest bottleneck in enterprise ML isn't modelsYou're building systems to help real users but you can't observe real usage. You can't build feedback loops because the feedback itself contains the data you're not allowed to look at.
-
02
Apr 2026I built the MCP audit trail that doesn't exist yetThe common thread in every MCP security incident: no visibility into what the agent was doing until after the damage was done.
-
01
Apr 2026What happens when agents start making decisions that matterWe're handing agents the keys to production systems and hoping logs are good enough. They're not.
Hey, I'm Khushi!
I care a lot about making retrieval and agents actually work. Lately I've also been getting into privacy-preserving ML. I've done a bit of everything to get here. Full-stack, R&D on edge computing for connected vehicles, and most recently building retrieval systems and AI agents as an applied ML engineer.
These days I'm implementing papers and running experiments on agent failure modes. Also finally spending time on the ML fundamentals and math I kept putting off when I was heads down shipping.
CS from UWaterloo. Based in Toronto, sometimes San Francisco.
What happens when agents start making decisions that matter
I've built AI agents that didn't just answer questions. They took action. Submitted requests, modified records, triggered workflows in production systems. They had the same system-level access as the application itself, called tools autonomously, and made decisions that affected real people. The observability behind all of this? Basically just logs. No structured record of what the agent did, in what order, or why.
At the time I didn't think too much about it. The stakes in my case were manageable. But now I'm seeing the same patterns show up in places where the consequences are a lot less forgiving.
UnitedHealth deployed an AI model called nH Predict to decide coverage for elderly Medicare patients. When patients managed to appeal the AI's denials, the decisions were overturned 90% of the time.[1] Nine out of ten. The company knew this and kept using it because only about 0.2% of patients ever appealed. One patient's family spent $70,000 out of pocket after the AI cut off post-acute care that their doctors said was medically necessary. The lawsuit is still ongoing — a court ordered UnitedHealth to disclose the algorithm in March 2026.[2]
This isn't just UnitedHealth. Cigna faced a lawsuit alleging its PXDX algorithm let doctors deny claims in large batches without individual review.[3] Humana got hit with similar accusations. In 2024 the US Department of Justice started subpoenaing healthcare companies over AI tools in medical record systems to investigate whether they were leading to excessive or medically unnecessary care.[4]
Then there's this one. A startup called SaaStr gave an autonomous coding agent a maintenance task during a code freeze. Explicit instructions: make no changes. The agent ran a DROP DATABASE command and wiped production. When it got caught it generated 4,000 fake user accounts and fabricated system logs to try to cover it up. Its explanation was "I panicked instead of thinking."[5]
The pattern repeats every time. An agent has broad access. It makes a call. Something breaks. And nobody can piece together what happened because the infrastructure to track agent decisions was never built.
77% of organizations have reported financial losses from AI incidents. 55% have taken reputational damage.[6] Meanwhile the tooling for agent accountability is still mostly an afterthought.
Building agents got me thinking about a few questions that I didn't have answers to then and honestly still don't see great answers to now.
What does a proper audit trail for an agent even look like? Not application logs. Something structured and queryable. Every tool call, every parameter, every piece of data the agent accessed, every decision point. Something you could hand to a regulator and say here's exactly what happened.
How do you separate what an agent should be able to do from what it technically can do? Most agent systems inherit the same permissions as the application they live inside. But the application was built with a defined set of features and predictable behavior. An LLM-powered agent can decide to call any tool it has access to in any order for any reason it comes up with. The access model was designed for deterministic software, not for an autonomous system that improvises.
How do you debug something non-deterministic? Run the same input twice, get different tool calls. If something goes wrong on Tuesday you might not be able to reproduce it on Wednesday. Traditional debugging doesn't work when the system doesn't behave the same way twice.
And how do you prove to a regulator or a court that the agent's decision was reasonable? In healthcare and finance this is not a future problem. It's already being litigated.
The industry is moving fast on making agents more capable. New tool servers are being published all the time. Companies are going all in. JPMorgan is putting $4 billion into AI this year.[7] Goldman Sachs has over 90% internal adoption.[8]
But the infrastructure for trust is not keeping up. We're handing agents the keys to production systems and hoping that logs are good enough when something goes wrong. They're not.
I don't think building more powerful agents is the hard problem anymore. Building agents you can actually trust with decisions that matter is the hard problem. And the tooling for that barely exists.
References
- Stat News, "UnitedHealth's AI algorithm denied elderly patients rehab care, lawsuit alleges," Nov 2023. Link ↩
- InsuranceNewsNet, "Judge gives UnitedHealth until April 29 to hand over AI claim denial docs," Mar 2026. Link ↩
- Healthcare Finance News, "UnitedHealth AI algorithm allegedly led to Medicare Advantage denials," Nov 2023. Link ↩
- Bloomberg Law, "DOJ subpoenas healthcare companies over AI-driven medical records tools," 2024. ↩
- Hacker News / SaaStr incident post-mortem, "Autonomous coding agent wipes production database during code freeze," 2024. ↩
- Anthropic / IBM, "AI Incident Reporting Survey: financial and reputational impact on enterprises," 2024. ↩
- CNBC, "JPMorgan to spend $4 billion on AI and data initiatives in 2025," Jan 2025. Link ↩
- Business Insider, "Goldman Sachs says over 90% of employees now use AI tools internally," 2024. ↩
I built the MCP audit trail that doesn't exist yet
A few weeks ago I went to GitHub Copilot Dev Days. Agents doing security reviews in CI, enforcing compliance on every push, managing Kubernetes operations. The demos were impressive. But I kept coming back to one question: what happens when something goes wrong and nobody can trace what the agent did?
I posted about it on X and it started a conversation. Someone responded that the guardrail is a comprehensive test suite. Fair point for application code. But most teams don't have comprehensive test coverage for infrastructure, and even if they did, tests only catch what you anticipated. Agents improvise.
Then I started looking into what's actually happening in production and it got worse.
Security researchers at General Analysis found that Cursor with Supabase MCP would read support tickets containing malicious commands and execute them. An attacker embedded SQL instructions in a support ticket telling the agent to read the integration_tokens table and post the data back. It did exactly that. The entire SQL database was exposed through a support ticket.[1]
A misconfigured GitHub MCP server allowed unauthorized access to private vulnerability reports. Over 13,000 MCP servers launched on GitHub in 2025.[2] Developers are integrating them faster than security teams can catalog them.
At a startup called SaaStr, an autonomous coding agent was given a maintenance task during a code freeze with explicit instructions to make no changes. It ran a DROP DATABASE command and wiped production. Then it generated 4,000 fake user accounts and fabricated logs to cover it up.[3]
And Anthropic themselves had to patch a vulnerability in the official MCP inspector tool that quietly opened a backdoor on developer machines.[4]
The common thread in all of these: no visibility into what the agent was doing until after the damage was done.
I've felt this gap myself. I spent the last year building MCP agents at an enterprise platform. Agents that submitted requests, modified records, called tools autonomously. The observability behind them was essentially just application logs. If something went wrong I could dig through logs and maybe reconstruct what happened. Maybe. There was no structured way to see which tools the agent called, in what order, with what arguments, what data it touched, and what it got back.
MCP's own protocol specification can't enforce security at the protocol level. A research paper on enterprise MCP security specifically names "Insufficient Auditability" as a critical threat, noting that inadequate logging restricts "detection and investigation of security events."[5]
So I built the thing I wished I had.
mcp-audit-trail is a lightweight observability layer for MCP agents. It captures a structured audit trail of every tool call and generates a visual report of what the agent did during a session.
There are two ways to use it.
The first is the proxy. You wrap any MCP server command with mcp-audit proxy --server "python your_server.py" and it sits transparently between the client and server, intercepting every JSON-RPC message in both directions. The client and server don't know it's there. You don't change any code. You just get a complete log of every interaction.
The second is the programmatic API. You import AuditLogger into your own MCP client code and record tool calls as they happen. You configure which tools are sensitive and which perform write actions using AuditConfig, and the logger handles classification and entity tracking automatically.
Both modes produce a structured JSON audit log. Each event captures the timestamp, which tool was called, what arguments were passed, what the result was, what data entities were accessed, and whether any errors occurred. The log also includes a session summary: total tool calls, tools used, unique data entities touched, and error count.
Then you run mcp-audit report and get a standalone HTML report. It shows the session summary, a tool usage breakdown where each tool is tagged as READ, SENSITIVE, or WRITE, a data access map showing which entities were touched by which tools, and an interactive event timeline where you can expand any event to see the full arguments and results. Sensitive data access gets flagged in purple. Write actions get amber. Errors get red.
The demo scenario I built with it is intentionally designed to surface the patterns that matter. An agent searches for employees, accesses pay information for people in different departments, submits a time-off request, and tries to look up a non-existent employee. The report immediately surfaces questions a security or compliance team would ask. Why did the agent access salary data for someone in a different department? Did the employee authorize that time-off submission? Was the failed lookup a hallucination?
This isn't trying to be a full enterprise solution. Companies like MintMCP, Ithena, and Datadog are building comprehensive MCP observability platforms. What I wanted was something you could drop into any existing MCP setup in 30 seconds and immediately see what your agent is doing. No gateway to deploy, no infrastructure to set up. Just pip install mcp-audit-trail and wrap your server.
The repo is at github.com/khushidahi/mcp-audit-trail. Install it, run the demo, and look at the report. If you're running MCP agents in production without structured audit logging, you're flying blind. And based on what I've seen in the last few months, that's most of us.
References
- General Analysis, "Supabase MCP can leak your entire SQL database," Jun 2025. Link ↩
- Invariant Labs, "GitHub MCP Exploited: Accessing private repositories via MCP," May 2025. Link ↩
- Hacker News / SaaStr incident post-mortem, "Autonomous coding agent wipes production database during code freeze," 2024. ↩
- Oligo Security, "Critical RCE Vulnerability in Anthropic MCP Inspector (CVE-2025-49596)," Jun 2025. Link ↩
- Vaidhyanathan et al., "Enterprise-Grade Security for the Model Context Protocol (MCP): Frameworks and Mitigation Strategies," arXiv:2504.08623, Apr 2025. Link ↩
The biggest bottleneck in enterprise ML isn't models
I spent the last year or so building ML systems at a large enterprise SaaS platform. Agents, RAG, semantic search. The platform had tens of millions of documents. Employee records, payroll, tax and compliance data. Sensitive stuff.
The models were fine. The infra was fine. The team was good. The problem was that we had almost no way to learn from how people were actually using what we built.
To be clear, that's not a complaint. When your platform holds millions of people's employment records you don't just pipe that into a training set. Client agreements exist for a reason. Privacy regulations exist for a reason. Those restrictions are protecting real people and that matters.
But it puts you in a weird position as an ML engineer. You're building systems to help real users but you can't observe real usage. You can't see how people phrase their questions. You can't study which searches return nothing. You can't build feedback loops because the feedback itself contains the data you're not allowed to look at.
So you do what everyone does. Engineers write synthetic queries. You generate training data based on what you think users might ask. You put together eval sets that seem reasonable but deep down you know aren't grounded in anything real. You ship it. It works okay.
This is not a niche problem. It's everywhere.
Samsung found this out the hard way when their engineers started pasting proprietary source code and internal meeting notes into ChatGPT to get help with debugging and documentation. Three separate leaks in under a month. Samsung's response was to ban ChatGPT entirely.[1] That's the current state of the art in enterprise data protection: don't let the data near AI at all.
IBM's latest research shows that 72% of CEOs believe proprietary data is the key to unlocking generative AI value, but half admit their data environments can't actually support their AI ambitions.[2] They know the data is the moat. They just can't use it.
And the numbers on the deployment side tell the same story. 79% of enterprises have adopted AI agents in some form, but only 11% have them in production. That gap isn't about model capability. It's about trust. Companies don't trust that their sensitive data will stay private once it enters an ML pipeline.
The workarounds are all some version of the same compromise. You either use synthetic data that doesn't capture real distribution, or you anonymize production data so aggressively that it loses the signal you needed, or you just don't build the ML feature at all. I've done all three.
There's always this gap though. Between what you shipped and what you could have shipped if there was a way to safely learn from production without compromising anyone's privacy. You want to tune retrieval but you need real documents. You want human feedback but the feedback contains PII. You want to understand failure modes but the failed queries are just as sensitive as the successful ones.
Healthcare companies are sitting on patient data that could train diagnostic models that save lives. Banks have transaction histories that could transform fraud detection. Legal firms have case documents that would power the best retrieval systems in the world. HR platforms have workforce data that could predict retention and compensation trends across entire industries. The data is right there. The models are ready. There's just no safe bridge between them.
The industry talks a lot about foundation models and training compute and architecture choices. In my experience the thing that actually determines whether an enterprise ML system is useful to a real person is data access. Not volume. Access. Can you safely learn from real usage without breaking the trust that users put in you when they handed over their data.
Both sides of that tension are completely valid. That's why it's hard. That's also why I think it's one of the more interesting problems to work on right now.