TPRM Copilot — Audit-Grade Vendor Risk Assessment Engine

Section · Overview

The shape of the problem.

Modern TPRM teams spend most of their time on three repetitive activities. All three are pattern-matching tasks well-suited to an LLM operating under audit-grade constraints — deterministic schemas, traceable evidence references, no hallucinated conclusions. TPRM Copilot wraps Claude in exactly those constraints.

01 · Problem

Manual vendor controls testing doesn't scale.

A typical TPRM analyst spends days reading vendor policies, walking sample transactions through every control attribute, and writing up findings with root cause. As vendor portfolios grow, the work compounds while headcount doesn't.

02 · Approach

Four specialized agents, deterministic schema.

A pipeline of four scoped Claude agents — policy parser, control tester, exception analyzer, workpaper writer — each returning a typed Pydantic object. The LLM proposes reasoning; deterministic Python writes the workpaper layout. The split is what makes the output audit-defensible.

03 · Outcome

End-to-end RCM in <10 seconds.

On the bundled sample case (5 vendor engagements), the pipeline produces a complete Risk Control Matrix workpaper, identifies the 2 seeded fieldwork exceptions, and surfaces 1 design gap — all with full evidence traceability and a --demo mode that runs without an API key.

controls identified (C1–C7)

sample transactions tested

findings auto-generated

100%

tests passing in regression CI

Section · Architecture

How the pipeline runs.

Every input is loaded into a typed model. Every agent's output is persisted to JSON. The final workpaper references those JSON artifacts so the chain of evidence is fully traceable — a finding in the RCM points to a TestResult, which points to a Transaction record, which points to the source file.

The audit-defensible design choice. LLMs propose reasoning. Deterministic Python writes layout. The LLM never decides whether a date is before another date — it only gets to phrase the explanation. This split is what makes the output workpaper survive review by a senior auditor.

Section · Code

A look at the agent loop.

Each agent is a thin orchestrator around a Claude call with a strictly-typed Pydantic output schema. Here's the agent that classifies exceptions into findings — the LLM gets a system prompt scoped to a single responsibility and a schema that constrains the response.

// src/tprm_copilot/agents/exception_analyzer.py

class ExceptionAnalyzer:
    """Classify exceptions into Findings with root cause + remediation."""

    def __init__(self, *, demo: bool = False) -> None:
        self.demo = demo

    def analyze(self, test_results: list[TestResult]) -> list[Finding]:
        # Demo mode = deterministic; live mode = Claude
        if self.demo or not claude.is_available():
            return self._demo_analyze(test_results)
        return self._llm_analyze(test_results)

    def _llm_analyze(self, test_results) -> list[Finding]:
        class FindingsResponse(BaseModel):
            findings: list[Finding]

        payload = "\n\n".join(tr.model_dump_json(indent=2) for tr in test_results)
        return claude.call_structured(
            system=SYSTEM_PROMPT,                  # scoped role
            user=f"Classify these test results:\n\n{payload}",
            response_model=FindingsResponse,        # schema-validated
        ).findings

// src/tprm_copilot/tools/claude.py — the schema enforcement

def call_structured(
    system: str,
    user: str,
    response_model: Type[T],
) -> T:
    """Call Claude expecting a JSON response that validates against `response_model`."""
    schema_hint = json.dumps(response_model.model_json_schema(), indent=2)
    full_system = (
        f"{system}\n\n"
        "You must respond with ONLY a JSON object validating this schema:\n\n"
        f"```json\n{schema_hint}\n```"
    )
    resp = client.messages.create(model=DEFAULT_MODEL, system=full_system,
                                    messages=[{"role": "user", "content": user}])
    # Validate against Pydantic — raise if Claude drifts from the schema
    return response_model.model_validate_json(_extract_json(resp))

Section · Demo

A live pipeline run.

Real output from tprm-copilot run --demo against the bundled Sample Tech Co. case. Demo mode is deterministic — no API key required — so this output is byte-stable across runs and acts as the regression test for the agent pipeline.

tprm-copilot · demo run · sample_outputs/

$ tprm-copilot run --demo --out sample_outputs/ ──────────── TPRM Copilot — Pipeline Start ──────────── Mode : DEMO Policy : config/policy.yaml Org chart : config/org_chart.json Controls : config/controls.yaml Txns : data/transactions.json Output : sample_outputs [1/4] PolicyParser — parsing vendor engagement policy extracted 6 thresholds, 4 rules loaded 10 org entries, 7 controls, 5 transactions [2/4] ControlTester — applying control attribute tests tested 5 transactions, 2 exception(s) raised Test of Controls — Per Sample # │ Requestor │ Vendor │ Required? │ Routing │ Timing │ Result ──┼────────────────┼──────────────────────┼───────────┼─────────┼────────┼─────────── 1 │ Kelly Smith │ Betty's Diner │ N/A │ N/A │ N/A │ Pass 2 │ Mike Jones │ Worldwide Computers │ Pass │ Fail │ Pass │ Exception 3 │ Dave White │ Top Talent Consult. │ Pass │ Pass │ Pass │ Pass 4 │ Kathy Wood │ Software Solutions │ Pass │ Pass │ Pass │ Pass 5 │ Mary Thompson │ Brownstone Restaurant│ Pass │ Pass │ Fail │ Exception [3/4] ExceptionAnalyzer — classifying exceptions, assigning root cause produced 3 finding(s) [4/4] WorkpaperWriter — generating audit workpapers wrote 04_RCM.xlsx wrote 04_test_of_controls.md wrote 04_findings_report.md ──────────────── Pipeline Complete ──────────────── Deviation rate : 40% Findings : 3 Outputs in : sample_outputs Test of details performed over 5 vendor engagement transactions. 2 exception(s) identified (40% deviation rate). 3 control weakness(es) documented in Section B with root cause and remediation. Control reliance NOT supported — substantive procedures should be extended.

Findings auto-generated by the pipeline

ID	Finding	Severity	Affected Controls
F-1	Wrong Approver — Incorrect Reviewer Assignment by Portal. Intra-year personnel reassignment not reflected in the Portal Table of Employee Supervisors; auto-routed to previous (incorrect) supervisor.	High	`C3, C7`
F-2	Retroactive Approval — Vendor Engaged Before Required Approval. Vendor invoice predates Portal approval by 8 days. Pre-engagement timing control is policy-based with no system-level preventive check.	High	`C6`
F-3	Absent Control — No Approved Vendor Listing / Independent Vendor Vetting. Vendor appropriateness assessed solely by individual supervisor judgment at point of approval. No preventive vendor-vetting control exists.	Medium	`(design gap)`

Section · Deliverables

What the pipeline produces.

Every run emits machine-readable JSON artifacts (for downstream tooling and audit trail) plus auditor-facing Excel + Markdown workpapers. The bundled sample outputs are committed to the repo for inspection.

01 · JSON

Parsed Policy

Structured representation of the vendor engagement policy — thresholds + rules — extracted by PolicyParser.

View JSON →

02 · JSON

Test Results

Per-transaction attribute results with expected vs. observed values and a one-sentence rationale for each determination.

View JSON →

03 · JSON

Findings

Classified exceptions with severity, root cause, affected controls, and recommended remediation.

View JSON →

04 · XLSX

Risk Control Matrix

Auditor-facing Excel workpaper with Cover + Section A (Controls) + Section B (Findings) tabs. Brand-styled.

Download XLSX →

04 · MD

Test of Controls Report

Markdown workpaper with per-sample attribute detail, status icons, and rationale traceable to source evidence.

View Markdown →

04 · MD

Findings Report

Markdown findings report with severity, description, root cause, and recommended remediation per finding.

View Markdown →

Section · Stack

Tech stack.

Modern Python with strict type contracts. Claude as the reasoner; deterministic Python everywhere else.

AI / Agent layer

anthropic (Claude API)
Structured-output prompts with JSON schema enforcement
Per-agent demo-mode fallback for offline / CI

Data contracts

pydantic v2 for every input/output
Versioned schemas — changes are explicit events
Validation at every agent boundary

Workpaper output

openpyxl for branded XLSX RCM
Markdown for Test of Controls + Findings
JSON for downstream tooling + audit trail

CLI & UX

click CLI · rich for live progress tables
python-dotenv for credential management
Pip-installable: pip install tprm-copilot

Testing & CI

pytest regression suite over demo mode
Schema validation = type-check at agent boundary
Sample case = golden output for CI

Frameworks referenced

NIST SP 800-53 (Vendor Management family)
COSO Internal Control
TPRM Lifecycle (intake → monitoring → offboarding)

Section · Roadmap

Where this is going.

v0.1 is a reference implementation aimed at a single workflow (vendor engagement). The shape generalizes to the broader TPRM lifecycle — intake, ongoing monitoring, and offboarding.

v0.2

Multi-policy support

Load multiple policies (Vendor Engagement + Data Privacy + Software Procurement) into a single test run with cross-policy conflict detection.

v0.3

ERP connector framework

Pull transaction data directly from SAP, Oracle, NetSuite instead of CSV/JSON — close the loop with the source of truth.

v0.4

SOC 2 / ISO 27001 ingestion

Auto-extract control evidence from vendor SOC reports for TPRM intake due diligence. Map evidence to controls programmatically.

v0.5

Continuous monitoring

Watch for new transactions and re-test on a schedule. Trigger findings as drift is detected, not at quarter-end.

v0.6

Evals + guardrails

Synthetic test cases covering edge controls; LLM-as-judge eval scoring against a held-out finding set.

v1.0

Production hosting

Hosted version with multi-tenant isolation, audit log, and a reviewer UI for accept/reject of agent-proposed findings.