OPA + Specs: Operationalizing NIST RMF 1.1 Management Functions

The NIST RMF 1.1 Operationalization Gap

The NIST AI Risk Management Framework (RMF) 1.1 defines four pillars for trustworthy AI:

MAP — Understand AI system context
MEASURE — Test and evaluate systems
MANAGE — Mitigate AI-specific risks
MONITOR — Track performance over time

But here’s the problem: NIST defines what to do, not how to do it operationally.

Organizations implementing NIST RMF report the same gap: “How do we enforce these policies at runtime across all deployed agents?”

This is where OPA + Specs becomes the operationalization layer—policy-as-code + behavioral verification = governance you can actually deploy.

The Open Policy Agent (OPA) Layer

Open Policy Agent is a policy engine that evaluates decisions in real-time. In Kubernetes, it’s the de-facto standard for admission control: “Is this pod allowed to run? Is this image from an approved registry?”

For agents, OPA works the same way:

# example-policy.rego
package agent_governance
 
# Policy: Agents can only invoke approved tools
approved_tools := ["search", "calculate", "summarize"]
 
deny[msg] {
    input.tool not in approved_tools
    msg := sprintf("Tool '%s' not in approved list", [input.tool])
}
 
# Policy: Agents cannot make financial transactions without approval
deny[msg] {
    input.action == "transfer_funds"
    input.requires_approval == false
    msg := "Financial transactions require explicit approval"
}
 
# Policy: Agents must log all high-risk actions
deny[msg] {
    input.action == "delete_data"
    input.logged == false
    msg := "Data deletion must be logged"
}

What OPA does well:

Decouples policy from code (policies live in policy repos, not agent code)
Enables policy versioning and auditing (“what did our policies require on Mar 1?”)
Scales to hundreds of agents + policies
Language-agnostic (works with any agent framework)

What OPA doesn’t do:

Define correct behavior (only define allowed behavior)
Verify agent output matches intent (only gate which tools agents use)
Generate audit trails of why decisions happened (only gate which decisions are allowed)

That’s where specs come in.

The Executable Specs Layer

An executable spec defines what correct behavior looks like:

name: "ApproveInsuranceClaim"
version: "1.0"
category: "claims_processing"
 
inputs:
  claim_id: string
  claim_amount: currency
  customer_history: object
  policy_coverage: object
 
outputs:
  decision: enum [approve, deny, escalate]
  reasoning: string
  timestamp: datetime
 
success_criteria:
  - "decision is one of: approve, deny, escalate"
  - "reasoning is non-empty and policy-references included"
  - "timestamp is ISO-8601 format"
  - "decision matches NAIC guidelines for amount + customer_history"
 
audit_requirements:
  - "every execution logged with: input_hash, output_hash, approval_chain"
  - "escalations flagged with reason + human_approver"
  - "monthly_compliance_report generated"

What specs do well:

Define success criteria (testable, measurable)
Enable behavioral verification (agent output matches spec = compliant)
Generate audit trails automatically (every execution logged)
Portable across agents (same spec works with Claude, GPT, local models)

What specs don’t do:

Gate which actions agents can take (that’s OPA’s job)
Enforce policy rules (that’s also OPA’s job)
Prevent unauthorized tool access (OPA again)

The Convergence: OPA + Specs

Together, they form a complete governance stack:

Agent wants to approve an insurance claim
       ↓
OPA layer: "Is this agent allowed to call the claims_approval tool?"
       ├─ YES → continue
       └─ NO → DENY
       ↓
Specs layer: "Execute ApproveInsuranceClaim spec—what counts as success?"
       ├─ Input validation (claim_id exists, amount > 0, etc.)
       ├─ Agent invokes reasoning engine
       ├─ Output captured against spec criteria
       └─ Success? → YES/NO logged
       ↓
Audit layer: "Log execution with: input_hash, output_hash, decision, approver, timestamp"
       ↓
Compliance export: "Generate NAIC compliance report from spec execution logs"

Example: Insurance Claims Approval Workflow

Scenario: 10 insurers + 12 states. Each has slightly different approval policies. You need to prove compliance to regulators.

Without OPA + Specs:

Each insurer writes custom code
Black-box agent decisions (no audit trail)
Manual review of decisions (slow, error-prone)
Regulatory audits require months of log digging

With OPA + Specs:

Single codebase + per-state policy files (OPA)
Standardized claim approval spec (works for all insurers)
Automatic audit trail (every decision logged with policy version + approval chain)
Regulatory audits: “Show me spec executions for claims >$10K approved on Mar 1” → instant CSV export

Operationalizing NIST RMF 1.1 with OPA + Specs

Here’s how each NIST pillar maps to the OPA + Specs layer:

MAP (Understand System Context)

NIST says: “Document the AI system’s inputs, outputs, and interactions with other systems.”

OPA + Specs does this by:

Specs catalog inputs/outputs (schema-driven)
OPA policies document approved interactions (“agents can only call these tools”)
Git history = immutable record of policy evolution

# Document approved integrations
packages["claims_approval"] {
    external_systems: ["underwriting_db", "payment_processor", "fraud_detector"]
    data_sensitivity: "high"
    requires_audit_trail: true
}

MEASURE (Test and Evaluate)

NIST says: “Test AI systems for performance, bias, security.”

OPA + Specs does this by:

Specs define test criteria (“decision matches NAIC guidelines”)
Execution logs provide test data (“X% of claims matched policy on first attempt”)
Measurable agent performance (“Agent success rate = 97.3%“)

# Spec includes measurable test criteria
success_criteria:
  - "decision accuracy >= 0.95 on test set (NAIC validation)"
  - "decision latency < 5 seconds (p99)"
  - "policy match rate >= 0.99 (audit trail)"

MANAGE (Mitigate AI-Specific Risks)

NIST says: “Implement controls to mitigate identified risks.”

OPA + Specs does this by:

OPA gates high-risk actions (agents can’t escalate without approval)
Specs verify actions match intent (agent escalation includes reasoning)
Audit trails prove controls were enforced

# Control: Claims over $50K require human escalation
deny[msg] {
    input.claim_amount > 50000
    input.escalated == false
    msg := "Claims > $50K must be escalated to human reviewer"
}

MONITOR (Track Performance Over Time)

NIST says: “Continuously monitor AI system performance.”

OPA + Specs does this by:

Execution logs tracked per agent, per policy version, per date
Spec success rates calculated automatically
Alerts on policy violations or performance regression

SELECT
  date,
  agent_id,
  spec_name,
  success_rate,
  policy_version
FROM spec_executions
WHERE date >= DATE_SUB(TODAY, INTERVAL 7 DAY)
GROUP BY date, agent_id, spec_name
HAVING success_rate < 0.95  -- ALERT

Implementation: Three Patterns

Pattern 1: OPA + Specs in Series (Decision Gate)

Agent executes action
  ↓
OPA policy check: "Is this allowed?"
  ├─ DENY → block + log violation
  └─ ALLOW → continue
  ↓
Execute against spec: "Is this correct?"
  ├─ FAIL → log violation
  └─ PASS → audit trail entry

Use case: Insurance claims (high-risk, high-audit). Gate first (OPA), then verify (specs).

Pattern 2: OPA + Specs in Parallel (Policy Verification)

Agent executes action
  ├─ OPA policy check (fast, sync)
  └─ Spec verification (can be async)
  ↓
Both must pass for execution to complete

Use case: Multi-agent governance. Fast policy checks + background verification.

Pattern 3: Specs After OPA (Compliance Proof)

OPA gates allowed actions
  ↓
Agent executes (within OPA guardrails)
  ↓
Specs verify execution match intent
  ↓
Audit trail = compliance proof

Use case: Government compliance (OPA as outer boundary, specs as inner verification).

Deployment Architecture

┌─────────────────────────────────────────────┐
│         Agent Framework (AutoGen/Eliza/etc) │
└────────────┬────────────────────────────────┘
             │ (agent requests action)
             ↓
┌─────────────────────────────────────────────┐
│      OPA Policy Engine (Access Control)      │
│  - Tool allowlists                          │
│  - Permission scoping                       │
│  - Risk-based routing                       │
└────────────┬────────────────────────────────┘
             │ (policy allows/denies)
             ↓ (ALLOW path)
┌─────────────────────────────────────────────┐
│   Specs Execution Layer (Behavior Verify)    │
│  - Input validation                         │
│  - Output verification                      │
│  - Success criteria matching                │
└────────────┬────────────────────────────────┘
             │ (execution results)
             ↓
┌─────────────────────────────────────────────┐
│      Audit & Compliance Layer               │
│  - Execution logging                        │
│  - Policy version tracking                  │
│  - Compliance report generation             │
└─────────────────────────────────────────────┘

Infrastructure stack:

OPA + Conftest (policy enforcement)
Spec registry (SpecMarket or custom)
Audit logging (DataDog, Splunk, or native)
Compliance reporting (automated exports)

Real-World Use Case: NAIC Insurance Pilot

The NAIC Model Bulletin on AI requires:

Audit trail of all automated decisions
Proof of policy compliance
Explainability of rejections

With OPA + Specs:

Day 1: Deploy claims approval agent
  - OPA policy defines NAIC requirements
  - Spec defines claim approval logic

Day 7: Regulator audit request
  - Query execution logs: "Show me 100 claims over $10K"
  - Get: decision, policy_version, approval_chain, timestamp
  - All claims verified against spec = compliance proof

Day 30: Performance review
  - Success rate: 96.8% auto-approved, 3.1% escalated, 0.1% denied
  - Policy violations: 0 (OPA caught all before execution)
  - Audit trail completeness: 100%

Without OPA + Specs? Manual review + log excavation + legal risk.

Why This Matters for NIST RMF 1.1

NIST RMF 1.1 addenda (rolling out Q2 2026) will operationalize the MANAGE function. The government is asking: “How do organizations actually enforce governance at scale?”

Answer: OPA + Specs.

OPA is already in production at 10,000+ organizations (Kubernetes standard)
Specs are emerging as the missing execution verification layer
Together they enable policy-as-code governance for AI systems

This is the institutional layer that regulators, enterprises, and governments are waiting for.

Getting Started

Install OPA: brew install opa (or choco install opa on Windows)
Write your first policy: Define allowed agent tools
Deploy specs: Catalog your agent success criteria
Connect the audit layer: Log all policy checks + spec executions
Generate compliance reports: Automated NIST RMF metrics

Example repo: opa-specs-starter (open source)

The Bigger Picture

OPA + Specs is how AI governance becomes institutional:

Policy-as-code (OPA) scales governance rules to thousands of agents
Executable specs (SpecMarket) verify agents follow policies
Audit trails prove compliance to regulators
Measurable reputation (spec success rates) replaces vibes

This is what NIST RMF 1.1 is building toward. And it’s available today.

Resources

Next Steps

If you’re implementing NIST RMF 1.1 for agents:

Start with OPA policies (gate high-risk actions)
Write specs for critical workflows (claims, approvals, transactions)
Set up audit logging (prove compliance)
Measure success rates (show improvement over time)

The regulatory window is open. The institutional layer exists. The only thing missing is adoption.