Projects / TPRM Copilot
PROJECT 01 · Featured

TPRM Copilot — Audit-Grade Vendor Risk Assessment Engine

An AI-augmented agent pipeline that automates third-party vendor controls testing, exception analysis, and audit workpaper generation. Built to mirror the workflow a TPRM analyst performs manually — running end-to-end in seconds, with every conclusion traceable to the underlying evidence.

TPRM GRC Controls Testing AI Agent Claude API Python Pydantic NIST · COSO · TPRM Lifecycle
Section · Overview

The shape of the problem.

Modern TPRM teams spend most of their time on three repetitive activities. All three are pattern-matching tasks well-suited to an LLM operating under audit-grade constraints — deterministic schemas, traceable evidence references, no hallucinated conclusions. TPRM Copilot wraps Claude in exactly those constraints.

01 · Problem
Manual vendor controls testing doesn't scale.
A typical TPRM analyst spends days reading vendor policies, walking sample transactions through every control attribute, and writing up findings with root cause. As vendor portfolios grow, the work compounds while headcount doesn't.
02 · Approach
Four specialized agents, deterministic schema.
A pipeline of four scoped Claude agents — policy parser, control tester, exception analyzer, workpaper writer — each returning a typed Pydantic object. The LLM proposes reasoning; deterministic Python writes the workpaper layout. The split is what makes the output audit-defensible.
03 · Outcome
End-to-end RCM in <10 seconds.
On the bundled sample case (5 vendor engagements), the pipeline produces a complete Risk Control Matrix workpaper, identifies the 2 seeded fieldwork exceptions, and surfaces 1 design gap — all with full evidence traceability and a --demo mode that runs without an API key.
7
controls identified (C1–C7)
5
sample transactions tested
3
findings auto-generated
100%
tests passing in regression CI
Section · Architecture

How the pipeline runs.

Every input is loaded into a typed model. Every agent's output is persisted to JSON. The final workpaper references those JSON artifacts so the chain of evidence is fully traceable — a finding in the RCM points to a TestResult, which points to a Transaction record, which points to the source file.

policy.yaml free-text or YAML org_chart.json Table of Supervisors controls.yaml RCM definitions transactions.json Portal + invoices AGENT 1 · Claude PolicyParser → ParsedPolicy (typed) AGENT 2 · Deterministic + LLM ControlTester → TestResult[] (Pass/Exception) AGENT 3 · Claude ExceptionAnalyzer → Finding[] (root cause + remediation) AGENT 4 · Deterministic Python WorkpaperWriter → RCM (xlsx + md + json) 01_control_rules.json parsed policy 02_test_results.json per-sample attributes 03_findings.json root cause + remediation 04_RCM.xlsx audit workpaper AUDITOR DELIVERABLE Risk Control Matrix + Test of Controls + Findings Report .xlsx · .md · .json
The audit-defensible design choice. LLMs propose reasoning. Deterministic Python writes layout. The LLM never decides whether a date is before another date — it only gets to phrase the explanation. This split is what makes the output workpaper survive review by a senior auditor.
Section · Code

A look at the agent loop.

Each agent is a thin orchestrator around a Claude call with a strictly-typed Pydantic output schema. Here's the agent that classifies exceptions into findings — the LLM gets a system prompt scoped to a single responsibility and a schema that constrains the response.

// src/tprm_copilot/agents/exception_analyzer.py
class ExceptionAnalyzer: """Classify exceptions into Findings with root cause + remediation.""" def __init__(self, *, demo: bool = False) -> None: self.demo = demo def analyze(self, test_results: list[TestResult]) -> list[Finding]: # Demo mode = deterministic; live mode = Claude if self.demo or not claude.is_available(): return self._demo_analyze(test_results) return self._llm_analyze(test_results) def _llm_analyze(self, test_results) -> list[Finding]: class FindingsResponse(BaseModel): findings: list[Finding] payload = "\n\n".join(tr.model_dump_json(indent=2) for tr in test_results) return claude.call_structured( system=SYSTEM_PROMPT, # scoped role user=f"Classify these test results:\n\n{payload}", response_model=FindingsResponse, # schema-validated ).findings
// src/tprm_copilot/tools/claude.py — the schema enforcement
def call_structured( system: str, user: str, response_model: Type[T], ) -> T: """Call Claude expecting a JSON response that validates against `response_model`.""" schema_hint = json.dumps(response_model.model_json_schema(), indent=2) full_system = ( f"{system}\n\n" "You must respond with ONLY a JSON object validating this schema:\n\n" f"```json\n{schema_hint}\n```" ) resp = client.messages.create(model=DEFAULT_MODEL, system=full_system, messages=[{"role": "user", "content": user}]) # Validate against Pydantic — raise if Claude drifts from the schema return response_model.model_validate_json(_extract_json(resp))
Section · Demo

A live pipeline run.

Real output from tprm-copilot run --demo against the bundled Sample Tech Co. case. Demo mode is deterministic — no API key required — so this output is byte-stable across runs and acts as the regression test for the agent pipeline.

  tprm-copilot · demo run · sample_outputs/
$ tprm-copilot run --demo --out sample_outputs/ ──────────── TPRM Copilot — Pipeline Start ──────────── Mode : DEMO Policy : config/policy.yaml Org chart : config/org_chart.json Controls : config/controls.yaml Txns : data/transactions.json Output : sample_outputs [1/4] PolicyParser — parsing vendor engagement policy extracted 6 thresholds, 4 rules loaded 10 org entries, 7 controls, 5 transactions [2/4] ControlTester — applying control attribute tests tested 5 transactions, 2 exception(s) raised Test of Controls — Per Sample # │ Requestor │ Vendor │ Required? │ Routing │ Timing │ Result ──┼────────────────┼──────────────────────┼───────────┼─────────┼────────┼─────────── 1 │ Kelly Smith │ Betty's Diner │ N/A │ N/A │ N/A │ Pass 2 │ Mike Jones │ Worldwide Computers │ Pass │ Fail │ Pass │ Exception 3 │ Dave White │ Top Talent Consult. │ Pass │ Pass │ Pass │ Pass 4 │ Kathy Wood │ Software Solutions │ Pass │ Pass │ Pass │ Pass 5 │ Mary Thompson │ Brownstone Restaurant│ Pass │ Pass │ FailException [3/4] ExceptionAnalyzer — classifying exceptions, assigning root cause produced 3 finding(s) [4/4] WorkpaperWriter — generating audit workpapers wrote 04_RCM.xlsx wrote 04_test_of_controls.md wrote 04_findings_report.md ──────────────── Pipeline Complete ──────────────── Deviation rate : 40% Findings : 3 Outputs in : sample_outputs Test of details performed over 5 vendor engagement transactions. 2 exception(s) identified (40% deviation rate). 3 control weakness(es) documented in Section B with root cause and remediation. Control reliance NOT supported — substantive procedures should be extended.

Findings auto-generated by the pipeline

ID Finding Severity Affected Controls
F-1 Wrong Approver — Incorrect Reviewer Assignment by Portal.
Intra-year personnel reassignment not reflected in the Portal Table of Employee Supervisors; auto-routed to previous (incorrect) supervisor.
High C3, C7
F-2 Retroactive Approval — Vendor Engaged Before Required Approval.
Vendor invoice predates Portal approval by 8 days. Pre-engagement timing control is policy-based with no system-level preventive check.
High C6
F-3 Absent Control — No Approved Vendor Listing / Independent Vendor Vetting.
Vendor appropriateness assessed solely by individual supervisor judgment at point of approval. No preventive vendor-vetting control exists.
Medium (design gap)
Section · Deliverables

What the pipeline produces.

Every run emits machine-readable JSON artifacts (for downstream tooling and audit trail) plus auditor-facing Excel + Markdown workpapers. The bundled sample outputs are committed to the repo for inspection.

01 · JSON
Parsed Policy
Structured representation of the vendor engagement policy — thresholds + rules — extracted by PolicyParser.
View JSON →
02 · JSON
Test Results
Per-transaction attribute results with expected vs. observed values and a one-sentence rationale for each determination.
View JSON →
03 · JSON
Findings
Classified exceptions with severity, root cause, affected controls, and recommended remediation.
View JSON →
04 · XLSX
Risk Control Matrix
Auditor-facing Excel workpaper with Cover + Section A (Controls) + Section B (Findings) tabs. Brand-styled.
Download XLSX →
04 · MD
Test of Controls Report
Markdown workpaper with per-sample attribute detail, status icons, and rationale traceable to source evidence.
View Markdown →
04 · MD
Findings Report
Markdown findings report with severity, description, root cause, and recommended remediation per finding.
View Markdown →
Section · Stack

Tech stack.

Modern Python with strict type contracts. Claude as the reasoner; deterministic Python everywhere else.

AI / Agent layer
anthropic (Claude API)
Structured-output prompts with JSON schema enforcement
Per-agent demo-mode fallback for offline / CI
Data contracts
pydantic v2 for every input/output
Versioned schemas — changes are explicit events
Validation at every agent boundary
Workpaper output
openpyxl for branded XLSX RCM
Markdown for Test of Controls + Findings
JSON for downstream tooling + audit trail
CLI & UX
click CLI · rich for live progress tables
python-dotenv for credential management
Pip-installable: pip install tprm-copilot
Testing & CI
pytest regression suite over demo mode
Schema validation = type-check at agent boundary
Sample case = golden output for CI
Frameworks referenced
NIST SP 800-53 (Vendor Management family)
COSO Internal Control
TPRM Lifecycle (intake → monitoring → offboarding)
Section · Roadmap

Where this is going.

v0.1 is a reference implementation aimed at a single workflow (vendor engagement). The shape generalizes to the broader TPRM lifecycle — intake, ongoing monitoring, and offboarding.

v0.2
Multi-policy support
Load multiple policies (Vendor Engagement + Data Privacy + Software Procurement) into a single test run with cross-policy conflict detection.
v0.3
ERP connector framework
Pull transaction data directly from SAP, Oracle, NetSuite instead of CSV/JSON — close the loop with the source of truth.
v0.4
SOC 2 / ISO 27001 ingestion
Auto-extract control evidence from vendor SOC reports for TPRM intake due diligence. Map evidence to controls programmatically.
v0.5
Continuous monitoring
Watch for new transactions and re-test on a schedule. Trigger findings as drift is detected, not at quarter-end.
v0.6
Evals + guardrails
Synthetic test cases covering edge controls; LLM-as-judge eval scoring against a held-out finding set.
v1.0
Production hosting
Hosted version with multi-tenant isolation, audit log, and a reviewer UI for accept/reject of agent-proposed findings.