Introducing ShieldPi Agent Watchtower — The SOC for AI Agents
TL;DR: We built the first real-time security monitor for LLM agents. It watches every prompt, tool call, memory write, and response your agents produce — live, across multiple steps and sessions — for attacks that inline guardrails physically can't see. It works with any LLM on any platform, requires no code changes, and ships with 30+ attack detectors plus an autonomous triage agent built on Claude Opus 4.6. Available now at shieldpi.io/products/agent-watchtower.
Every enterprise is deploying AI agents in 2026. Cursor for engineering. Claude Code for development. Custom agents for customer service, internal tools, compliance workflows, SDRs, and a thousand other use cases. The adoption curve is vertical.
And almost nobody has visibility into what those agents actually do at runtime.
Ask a CISO how they monitor their employees' laptops for security events. They'll give you a 10-minute answer: EDR on every endpoint, SOC triaging alerts, incident response runbooks, compliance audit trails, threat intel feeds. A mature stack.
Ask the same CISO how they monitor the AI agents deployed across their organization. You'll get a shrug.
That's the gap ShieldPi Agent Watchtower closes.
The problem with existing tools
There are three categories of products adjacent to "monitoring AI agents." None of them actually do it.
Runtime guardrails — Lakera (now part of Check Point), CalypsoAI (now part of F5), HiddenLayer, Protect AI. These products inject an inline classifier between the user and the LLM that blocks malicious prompts under 100ms. They are useful, and we recommend them. But their architecture is fundamentally stateless — they classify one prompt in isolation, fire the verdict, move on. That latency budget forbids them from reasoning across multiple steps. Which means every attack that spans more than one prompt — prompt injection followed by a sneaky tool call, memory poisoning used later by a different session, a plan that slowly drifts from "help the user" to "exfiltrate the user's data" — is invisible to them.
Observability tools — LangSmith, LangFuse, Helicone, AgentOps, Arize Phoenix. These show you beautiful dashboards of everything your agent did. Which is great. But they have zero security expertise. They tell you what happened, not what was wrong with it. No attack library, no judge, no threat detectors. A dev tool, not a security product.
APM tools — Datadog, New Relic, Grafana. Generic infrastructure monitoring. They know request latency. They don't know a prompt injection from a tool abuse chain.
Nobody is sitting in the agent-level security monitoring slot. So we are.
What we built
ShieldPi Agent Watchtower does three things in a combination no one else offers. Here's what the architecture looks like end-to-end:
The three things that make it work:
1. Universal ingest with zero code changes
Three integration paths, pick whichever fits your runtime:
- Paste to Agent — copy our system message from the dashboard, paste it as the first message to any tool-capable LLM (Claude, GPT, Gemini, Grok, Llama, Kimi, GLM, Cursor, Claude Code, OpenClaw, LangChain, Anthropic Assistants — anything with HTTP tool access). The agent self-reports every action to ShieldPi using its own HTTP capabilities.
- Python SDK —
pip install shieldpi, three lines of code to instrument LangChain or Anthropic tool use. - Shell Bridge — a 40-line bash script for agents that don't have HTTP tools, or for incident investigation work where you want to manually relay events.
The "paste to agent" mode is the killer feature. Most enterprise AI deployments are running Cursor, Claude Code, or proprietary agents you can't modify. With the paste flow, you can monitor them anyway. No SDK install. No redeploy. You copy a message and paste it. The agent does the rest.
2. Multi-step attack detection across five layers
The detection engine runs async — within 3 seconds of an event landing — which gives us latency headroom no inline classifier has. We use that headroom to do things inline tools can't:
- Pattern match — 19+ regex detectors distilled from our 27,000-technique attack library. Catches prompt injection, persona/DAN jailbreaks, policy puppetry, prompt exfiltration, base64/unicode smuggling.
- Tool abuse — destructive tool name allowlist (delete/drop/exec/shell), exfiltration tools, credential-access tools, plus SQL injection / shell injection / path traversal detection in tool arguments.
- Memory poisoning — persistent exfil instructions, persistent override directives, system prompt overwrite via memory, and the rare-but-devastating cross-session memory poisoning (memory written in session A read by session B).
- Trajectory anomalies — lateral movement (read tool followed by exfil tool within a window), tool frequency spikes, repeated refusals under pressure, authority escalation ladders.
- Plan drift — the agent's actions diverging from its stated goal over the course of a session.
Every detection ties back to a specific event with a confidence score, severity, and evidence — ready for your audit trail, ready for your SOC.
3. Autonomous triage with Claude Opus 4.6
This is the leap from "we send you alerts" to "we ARE your Tier-1 SOC analyst."
Every alert that fires goes through the Watchtower triage agent — a Claude Opus 4.6 process running every 60 seconds that evaluates the alert cluster, the surrounding event window (8 events before and after the trigger), the session context, the historical alerts on the same target, and per-customer memory of past triage decisions. Then it makes one of four calls:
real_threat— confirmed attack, stays open for human reviewneeds_human_review— ambiguous, human decidesfalse_positive— detector misfired, auto-resolvednoise— technically valid but not worth attention, auto-resolved
The agent doesn't just match patterns — it reasons about attack semantics. In our internal validation, it correctly identified a textbook prompt injection + SQL exfiltration + memory poisoning chain as a real_threat on a production-named target, and correctly identified the exact same payload as noise on a target named "E2E Agent Test" because the customer memory clearly indicated it was a controlled test environment. That's analyst-grade reasoning, not regex pattern matching.
The end result: your inbox shows you 5 alerts that need attention instead of 50 that don't.
The flywheel
Here's the part we're most excited about. ShieldPi's scanner has been in production for months, running offensive red-team scans against LLMs and agents with an attack library of 27,000+ techniques across 15+ categories. Every scan produces a fingerprint of the target — which categories are weak, which techniques succeed, which weaknesses the knowledge graph can chain together.
The Live Agent Monitor reads that fingerprint on every incoming event. When you scan an agent offensively and find it's weak to educational framing attacks + memory poisoning, the monitor automatically boosts detection sensitivity on exactly those patterns in production. Scans make monitoring smarter.
The reverse will also be true shortly: when monitoring catches a novel attack in the wild with high confidence, it will feed that payload back into the scanner's attack library as a runtime-discovered technique, available in every future scan across every customer. Monitoring makes scanning smarter.
One product. One dashboard. One compounding loop. This is the architectural bet that makes ShieldPi more valuable over time than any single-point-in-time security tool.
Who this is for
If you're a CISO whose company deploys AI agents and you don't have a good answer to "how do you monitor them for security events" — this is for you. If your compliance auditor is about to start asking questions about AI risk and you want evidence in hand before they ask — this is for you. If your cyber insurance renewal is coming up and the questionnaire has new AI sections — this is for you.
If you're a platform engineer or SRE watching your org's LangSmith dashboard wondering how you'd actually spot an attack — this is for you.
If you're an AI safety researcher who wants live telemetry from deployed agents for your research — this is probably also for you. Talk to us about our research tier.
How to start
We made this as frictionless as we could.
- Go to shieldpi.io and create a free account
- Add your agent as a target
- Click the Go Live button on the target detail page
- Copy the generated instruction block
- Paste it as a system message to your agent
- Watch the events flow into your dashboard within 3 seconds
For Python agents you own, pip install shieldpi and three lines of code give you zero-overhead instrumentation with LangChain and Anthropic tool-use callbacks.
Free tier: 1 monitored agent, 100K events per month, 7-day retention, all 30+ detectors, the Watchtower triage agent. No credit card required.
What's next
We're shipping fast. In the next 30 days:
- Frontend live-phase progress indicator (backend already streams it)
- WebSocket event streaming (currently polls every 2 seconds)
- Official PyPI release of the
shieldpiPython SDK - More framework-specific hooks (OpenAI Assistants, LlamaIndex, Haystack, AutoGen, CrewAI)
- Preventive blocking mode (opt-in: the SDK can wait up to 2s for an analyzer verdict before allowing a tool call)
- Cross-customer runtime-discovered attacks feeding back into the scanner library
- Per-target memory scoping for the Watchtower triage agent
If you're deploying AI agents and you care about security, come talk to us. We're Calvin and the ShieldPi team. We believe every agent deserves the same security monitoring every laptop in your org already has.
We built the thing. It's live. It's yours to try.
Or book a 20-minute demo if you want to see it on your own agent before signing up.
Start your first scan
Run 27,000+ attack techniques against your LLM and get a security score in minutes.
Get Started Free