Security Guide
AI Agent Security — Best Practices Guide
Comprehensive framework for governing autonomous AI agents across your enterprise — from cryptographic identity issuance to real-time enforcement, behavioral monitoring, and incident response.
1. Establish Agent Identity Before Deployment
- Every agent must receive a cryptographic identity (SPIFFE/X.509 or Ed25519) at registration — no anonymous agents in production.
- Rotate agent credentials on a short TTL schedule (24–72 hours maximum). Long-lived tokens are the primary attack surface.
- Enforce mutual TLS for all agent-to-service communication. One-way TLS is insufficient for agentic workloads.
2. Define Least-Privilege Policies Before Launch
- Scope every agent's permissions to the minimum required for its declared task. Avoid "admin" roles as a convenience shortcut.
- Express policy as code (OPA Rego). Human-readable policy descriptions must compile to machine-enforced rules.
- Review effective permissions after every deployment. Drift between declared and effective access is the most common misconfiguration.
3. Enforce at Runtime — Not Just at Deploy Time
- Deploy an inline enforcement layer (AI Firewall) on all LLM calls — bidirectional DLP, PII masking, prompt injection detection at under 50ms p99.
- Set per-agent token and cost budgets with automated circuit breakers. A single runaway agent can consume an entire monthly AI budget in hours.
- Apply behavioral baselines after initial deployment. Deviation from baseline triggers risk score escalation — not just static rule violations.
4. Maintain an Immutable Audit Trail
- Log every agent action with cryptographic timestamps. Mutable logs are not audit evidence.
- Export compliance evidence continuously — not only when auditors ask. Retroactive evidence collection is error-prone and slow.
- Map logged events to framework controls (SOC 2, FedRAMP, HIPAA) in real time so gaps surface before review cycles.
5. Prepare an Incident Response Plan for Agents
- Implement a kill switch with sub-100ms broadcast latency. Human-speed investigation is too slow when an agent is actively causing harm.
- Isolate compromised agents without taking down the entire deployment. Blast radius control is as important as detection speed.
- Post-incident: revoke the agent's identity, rotate any credentials it had access to, and re-audit its action history before reinstating.