Security Best Practices
Production hardening for Agent Secret Store — from namespace design to incident response. Follow these patterns to run a credential store that survives real adversarial conditions.
Least-privilege namespace design
The most important security decision is your namespace structure. Namespaces directly control how granularly you can scope tokens. A flat namespace means all agents get access to everything in the same scope; a service-hierarchical namespace lets you issue tokens that cover exactly one service's credentials.
# ❌ BAD: One flat namespace — all agents can access all secrets
production/openai-key
production/stripe-key
production/db-password
production/github-token
# ✅ GOOD: Service-scoped namespaces
# Scope tokens to the minimum path prefix needed
payments-service/stripe/secret-key
payments-service/stripe/webhook-secret
ml-platform/openai/api-key
ml-platform/pinecone/api-key
auth-service/google/client-secret
data-pipeline/postgres/connection-string
# ✅ GOOD: Environment isolation prevents cross-env leakage
production/payments/stripe/secret-key # token: secrets:read:production/payments/*
staging/payments/stripe/secret-key # token: secrets:read:staging/payments/*
# A staging agent CANNOT read production secretsNamespace changes are breaking
Restructuring namespaces after go-live requires updating all scoped tokens and agent configurations. Design your namespace hierarchy before your first production deployment.
Token best practices
Scoped tokens are the primary access control mechanism. Issue the narrowest possible scope for the shortest necessary TTL:
from agentsecretstore import AgentVault
async def issue_minimal_tokens():
async with AgentVault() as vault:
# ❌ BAD: overly broad scope
bad_token = await vault.request_token(
scope="secrets:*:*", # Full access — never do this
ttl_seconds=86400, # 24 hours — too long
)
# ✅ GOOD: narrowly scoped, short TTL
good_token = await vault.request_token(
scope="secrets:read:production/ml-platform/openai/*",
ttl_seconds=1800, # 30 minutes — matches task duration
description="GPT-4 inference batch job #4827",
allowed_ips=["10.0.1.50"], # Only from the known agent host
)
# ✅ GOOD: single-use for high-risk one-off operations
single_use = await vault.request_token(
scope="secrets:read:production/payments/stripe/secret-key",
ttl_seconds=300, # 5 minutes
max_uses=1, # Burns after one read
description="One-time payment intent creation",
)| Use case | Recommended TTL | Notes |
|---|---|---|
| One-off task (payment, send email) | 5–15 minutes | Use max_uses=1 for truly single-use ops |
| Short batch job (< 1 hour) | 30–60 minutes | Match TTL to expected job duration |
| Long-running agent session | 2–8 hours | Request a new token before TTL expires |
| Service worker (e.g. API server) | Max 24 hours | Rotate token daily via cron; never share master key |
| CI/CD pipeline | Duration of pipeline + buffer | Use dedicated service account key per pipeline |
Setting appropriate access tiers
Assign access tiers based on the blast radius if the credential were compromised. When in doubt, err toward a higher tier — you can always relax it after observing false-positive approval friction.
standardRead-only keys, public data sources, sandbox credentials, rate-limited free-tier keys, non-production secrets.
sensitiveWrite-capable API keys, OAuth tokens, staging database passwords, service-to-service tokens with meaningful access.
criticalProduction databases, payment processor secrets, admin API keys, SSH private keys, signing keys, KMS credentials.
IP allowlisting for production agents
IP allowlisting is the most powerful defense-in-depth measure available. A stolen token is useless from the wrong IP address. Always specify allowed_ips for production token requests when your agents run from known infrastructure:
from agentsecretstore import AgentVault
async def configure_ip_restrictions():
async with AgentVault() as vault:
# Restrict a token to a specific agent host
token = await vault.request_token(
scope="secrets:read:production/stripe/*",
ttl_seconds=3600,
allowed_ips=["10.0.1.50"], # Single agent
)
# Restrict to a subnet (GCP/AWS private subnet)
subnet_token = await vault.request_token(
scope="secrets:read:production/*",
ttl_seconds=3600,
allowed_ips=["10.128.0.0/20"], # GCP us-central1 subnet
)
# Restrict to multiple known hosts
multi_host_token = await vault.request_token(
scope="secrets:read:production/ml-platform/*",
ttl_seconds=7200,
allowed_ips=["10.0.1.50", "10.0.1.51", "10.0.1.52"],
)Cloud provider IP ranges
GCP, AWS, and Azure publish their IP ranges as JSON files. For agents running on managed compute, allowlist the subnet CIDR of your VPC rather than individual IPs. This handles pod autoscaling without manual token updates.
Single-use tokens for high-risk operations
For one-time operations like payment processing, signing, or sending a notification, use max_uses=1 to create a burn-after-reading token. After the credential is retrieved once, the token is server-side invalidated — even if the token string leaks, it's already dead.
Combine with a short TTL (5 minutes) for maximum security: the token is useless after one use or after 5 minutes, whichever comes first.
Approval workflows for production secrets
For critical-tier secrets, require approval even when the requesting agent has appropriate permissions. This adds a human check-in point for sensitive operations:
from agentsecretstore import AgentVault
async def request_with_approval():
"""
For critical-tier secrets, the vault holds the request and sends
a notification to your approver. The agent waits (or polls) for
the decision before the token is returned.
"""
async with AgentVault() as vault:
# This call blocks until approved (or times out)
token = await vault.request_token(
scope="secrets:read:production/payments/stripe/secret-key",
ttl_seconds=600,
description="Payment processing for invoice #INV-2025-0891",
require_approval=True, # Force approval even if not critical
approval_timeout_seconds=300, # 5-minute approval window
)
# Token only returned after approval
print(f"Approved! Token: {token.value[:20]}...")Rotation schedule recommendations
| Credential type | Recommended rotation | Priority |
|---|---|---|
| Payment processor keys (Stripe, PayPal) | Every 14 days | 🔴 Critical |
| Production database credentials | Every 30 days | 🔴 Critical |
| SSH private keys | Every 30 days | 🔴 Critical |
| Production API keys (OpenAI, Anthropic) | Every 90 days | 🟡 High |
| OAuth tokens | On provider expiry + proactive 60-day | 🟡 High |
| Staging/dev API keys | Every 180 days | 🟢 Medium |
| Webhook secrets | Every 180 days | 🟢 Medium |
| Read-only data source keys | Annually | ⚪ Low |
Monitoring the audit trail for anomalies
Set up a recurring check (e.g. every 15 minutes) that queries the audit API for suspicious patterns. Key signals to watch:
import asyncio
from datetime import datetime, timezone, timedelta
from agentsecretstore import AgentVault
async def check_anomalies():
async with AgentVault() as vault:
one_hour_ago = datetime.now(timezone.utc) - timedelta(hours=1)
# 1. Spike in reads — possible exfiltration attempt
recent_reads = await vault.audit.query(
event_types=["secret.read"],
since=one_hour_ago,
)
if len(recent_reads) > 500:
alert(f"Unusual read volume: {len(recent_reads)} reads in 1 hour")
# 2. Denied access attempts — possible probing
denied = await vault.audit.query(
event_types=["secret.read"],
status="denied",
since=one_hour_ago,
)
if len(denied) > 10:
alert(f"Multiple denied reads: {len(denied)} in 1 hour")
# 3. Off-hours access to critical secrets
after_hours = [
e for e in recent_reads
if e.resource_tier == "critical"
and not (9 <= e.timestamp.hour <= 18)
]
if after_hours:
alert(f"After-hours critical access: {len(after_hours)} events")
# 4. Access from unexpected IPs
known_ips = {"10.0.1.50", "10.0.1.51", "10.128.0.5"}
unexpected = [e for e in recent_reads if e.ip not in known_ips]
if unexpected:
alert(f"Access from unexpected IPs: {[e.ip for e in unexpected]}")
def alert(message: str):
print(f"🚨 ALERT: {message}")
# Send to PagerDuty, Slack, etc.If a key is compromised
Execute this runbook immediately when you suspect a credential has been leaked or compromised. Speed matters — do steps 1 and 2 before anything else:
from agentsecretstore import AgentVault
async def respond_to_compromise(compromised_path: str):
"""
Incident response runbook — execute immediately when a key is suspected compromised.
"""
async with AgentVault() as vault:
# STEP 1: Revoke all active tokens that cover the compromised path
tokens = await vault.tokens.list(resource_path=compromised_path)
for token in tokens:
await vault.tokens.revoke(token.id)
print(f"Revoked token: {token.id} ({token.description})")
# STEP 2: Rotate the secret immediately
# You'll need the new value from your provider first
# await vault.update_secret(path=compromised_path, value="new-key-here")
# STEP 3: Pull the full audit log for the compromised key
history = await vault.audit.query(
resource_path=compromised_path,
# No time limit — get everything
)
print(f"Total accesses: {len(history)}")
# Find all actor IDs that read this secret
actors = {e.actor_id for e in history if e.event == "secret.read"}
print(f"Actors that read this secret: {actors}")
# Get IP addresses involved
ips = {e.ip for e in history if e.event == "secret.read"}
print(f"Source IPs: {ips}")
# STEP 4: Export the audit log for the incident report
csv_data = await vault.audit.export(
resource_path=compromised_path,
format="csv",
)
with open(f"incident-{compromised_path.replace('/', '-')}.csv", "wb") as f:
f.write(csv_data)Team member access control
Human access to the vault dashboard is managed through roles. Apply the same least-privilege principle to humans as to agents:
| Role | Can do | Who should have it |
|---|---|---|
| Admin | All actions including member management and billing | Vault owner only (1–2 people) |
| Editor | Create, update, rotate secrets; manage approval policies | Platform/DevOps leads |
| Viewer | View secret metadata (never plaintext values) | Developers, on-call engineers |
| Auditor | Read-only audit log access; CSV export | Security team, compliance officers |
Never share admin credentials
Each team member should have their own account. Shared admin credentials make it impossible to attribute changes in the audit log — a core SOC 2 requirement.
Scoped Tokens →
Deep dive into scope format, TTLs, and IP allowlisting.
Audit Trail →
Query and export the full event history for compliance.
Secret Rotation →
Automate credential rotation with zero downtime.
Compliance Roadmap →
SOC 2, GDPR, HIPAA, and PCI-DSS status and plans.