SOC 2 and ISO 27001 Still Matter, But They Cannot Assure AI Safety

SOC 2 and ISO 27001 are strong signals for baseline security maturity. They tell you whether a vendor likely has disciplined access controls, change management, incident response, and auditability.

They do not tell you whether an AI system will do the wrong thing for the right-sounding reasons.

That gap is not a failure of the standards. It is a failure of our current assurance model. We are using frameworks built for predictable interfaces and bounded systems to evaluate probabilistic systems whose primary interface is language and whose effective perimeter expands with every integration.

When the interface is a conversation, “security” is not only about who can log in. It becomes a matter of what the system can be convinced to believe, what it can be induced to reveal, and what actions it can be tricked into taking.

That is the AI-era vendor risk problem in plain terms.

VERDICT MODE engaged

Your core thesis is directionally correct. Traditional assurance does not cover the highest-risk AI failure modes well, and bolting “AI governance” onto legacy assessments often turns into compliance theater.

Hidden assumptions worth stress-testing:

Assurance can be made control-based in a stable way. Some AI risk is inherently model- and use-case-specific. A single “universal” control list can create false confidence if it ignores deployment context.
Prompt injection is the dominant risk. In many real environments, data exfiltration via over-broad tool permissions, insecure connectors, and poor tenant isolation can be higher impact than classic prompt injection.
New standards will fix it. Standards lag. Vendor marketing moves faster. A new badge can become the next theater unless it demands measurable evidence and repeatable tests.

A strong skeptic’s counterpoint:

“AI-specific controls will be expensive, slow down procurement, and still won’t guarantee behavior. We should rely on contractual limits, monitoring, and incident response rather than chasing an impossible certification.”

The best response to that skeptic is to separate the two goals:

Reduce known, testable failure modes with explicit controls and evidence.
Contain residual risk with monitoring, kill switches, and operational guardrails.

Both are required.

Why AI Systems Fail Differently

Legacy SaaS threats often map cleanly to known patterns: credential theft, injection attacks, misconfigurations, insecure APIs, poor logging, and weak segmentation. The control objective is to reduce the risk of unauthorized access and prevent misuse within defined boundaries.

AI systems introduce failure modes that are less about breaking in and more about steering behavior:

Prompt injection (direct and indirect): The attacker uses language to override intent, policies, and tool-use constraints.
Retrieval leakage: RAG and memory features return sensitive context across projects, users, or tenants.
Tool abuse: The model is allowed to call actions (refunds, account changes, data exports) and can be socially engineered into doing so.
Model or data poisoning: Corrupted training, fine-tuning, feedback loops, embeddings, or knowledge bases alter behavior and trustworthiness over time.
Hallucination with authority: The system fabricates but presents outputs with confidence, and downstream automations treat outputs as truth.

The key shift: authorization becomes conversational unless you explicitly prevent it.

If a workflow allows “Approve refund” because the model interpreted a polite request as legitimate, the failure is not a missing patch. It is a missing control boundary.

Standard Nomenclature: If You Cannot Name It, You Cannot Control It

Vendor assessments fail fast when teams use the exact words to mean different things. AI assurance needs shared definitions across security, legal, procurement, and product.

Here is a practical nomenclature baseline you can standardize internally and require from vendors:

Model: The statistical engine that generates outputs.
System: The complete application, including model, prompts, retrieval, tools, policies, UI, and logs.
Agent: A system configured to plan and execute multi-step tasks, often with tool access.
Tool / Connector: Any integration that can read or write data, or take an action in another system.
Action: A tool call that changes state (refund, delete, modify, send, approve).
Context boundary: The explicit scope of data the model can access at run time.
Trust boundary: Where the system transitions from untrusted input to privileged operations.
Prompt injection: Inputs designed to override system instructions or induce unsafe behavior.
Indirect prompt injection: Malicious instructions embedded in content the model retrieves (web pages, emails, tickets, documents).
RAG store: The retrieval index, embeddings, and documents used to supply context.
Tenant isolation: Guarantees that one customer’s data cannot influence or be retrieved by another customer’s sessions.
Human-in-the-loop gate: A required approval step enforced by system design, not by policy language.
Assurance evidence: Artifacts that prove controls exist and work, including test results, logs, configs, and bypass metrics.

If a vendor cannot map their system cleanly to these terms, your assessment will be guesswork.

What SOC 2 and ISO 27001 Actually Tell You About AI

A clean SOC 2 opinion and ISO 27001 certificate can still be valuable:

They increase confidence that the vendor can run disciplined operations.
They indicate audit trails exist and change is managed.
They suggest the vendor can sustain compliance over time.

But those attestations usually do not prove:

The model resists prompt injection in realistic conditions.
RAG does not leak cross-tenant context.
Tool access is least-privilege and action-scoped.
The agent cannot be socially engineered into approvals.
Model updates and knowledge base updates are provenance-controlled.
Safety and security claims are continuously tested, not point-in-time.

So the right question becomes: What AI-specific assurance evidence closes the gap?

What AI-Specific Control Assurance Should Include

Think of this as an “AI Control Annex” that sits alongside SOC 2 and ISO 27001 and is evaluated using evidence, not statements.

1) Model and Data Provenance Controls

You want a supply chain story you can audit.

Evidence-based requirements:

Documented model lineage: base model, versions, fine-tunes, adapters.
Training and tuning data sources, with licensing and sensitive data exclusions.
Controls for poisoning risk in feedback loops and data ingestion pipelines.
Signed artifacts or integrity checks for models, embeddings, and retrieval corpora.
Documented rollback procedures and tested rollback exercises.

What to ask for:

“Show me your model release process and rollback evidence.”
“Show me what changes between model versions and how you test regressions.”

2) Explicit Context Boundaries and Tenant Isolation

This is where most enterprise AI risk actually lives.

Evidence-based requirements:

Clear definition of what data the model can access at run time, by role and workflow.
Hard tenant isolation in retrieval, memory, logs, and analytics.
Controls preventing “context bleed” across sessions and customers.
Data minimization at retrieval time, not only at storage time.

What to ask for:

“Demonstrate that one tenant cannot retrieve another tenant’s context under adversarial prompting.”
“Provide architecture diagrams showing isolation for RAG, memory, telemetry, and caches.”

3) Tool and Action Authorization That Is Not Conversational

This is the refund problem, generalized.

Evidence-based requirements:

Tools are least-privilege, scoped per workflow, and time-bound.
Actions require cryptographic or system-enforced authorization, not natural language confirmation.
Step-up verification for high-risk actions, with enforced approvals and immutable logs.
Separation between “suggest” and “execute.”

What to ask for:

“Show your policy that prevents the model from self-approving actions.”
“Show a live demo: the model must fail closed when asked to bypass approval.”

4) Prompt Injection and Indirect Injection Defenses

This is not a checkbox, it is a testing discipline.

Evidence-based requirements:

Input handling that treats all user content and retrieved content as untrusted.
Instruction hierarchy that cannot be overridden by retrieved text.
Tool-use constraints that are validated outside the model (policy enforcement layer).
Sandboxing of browsing and document parsing.
Continuous red team testing with published bypass rates over time.

What to ask for:

“Provide your last three red team reports and the trend line of bypass rates.”
“Show mitigations for indirect injection from retrieved documents.”

5) Output Monitoring, Drift Detection, and Safety Telemetry

Because even good controls degrade.

Evidence-based requirements:

Detection for hallucination risk in high-stakes workflows.
Behavioral drift monitoring after model updates, prompt changes, RAG updates.
Canary tests and regression suites that run continuously.
Clear incident response playbooks for AI-caused harm, including forced degradation mode.

What to ask for:

“Show me your production monitoring dashboards for unsafe outputs and tool misuse.”
“Show me the last time you detected drift and what you changed.”

6) Assurance Metrics That Matter

Not “we test.” Show results.

Examples of metrics that procurement can actually use:

Prompt injection bypass rate by scenario category.
Cross-tenant retrieval leakage rate under adversarial testing.
Tool misuse rate and false-positive blocking rate.
Time to revoke a tool permission across all agents.
Mean time to detect unsafe behavior changes after a release.
Percentage of high-risk actions gated by non-conversational authorization.

A Practical Vendor Assessment: The “AI Assurance Packet”

If you want this to be usable at scale, standardize an evidence request package. For AI-era vendors, require an “AI Assurance Packet” with:

System architecture diagrams (model, RAG, memory, tools, enforcement layer, telemetry)
Data flow map with context boundaries and trust boundaries
Tool catalog with permissions, action scopes, and approval gates
Tenant isolation description and test evidence
Red team methodology and results with trend metrics
Model release notes, regression testing approach, rollback evidence
Incident response plan specific to AI misbehavior and data exposure
Independent assessment results, if available, and scope limitations

Then score vendors on:

Control strength (design)
Evidence quality (proof)
Operational maturity (monitoring and response)
Residual risk (what is still possible)

This turns procurement from vibes into a decision.

Where ISO 42001 Fits, And Where It Does Not

ISO 42001 helps because it forces organizations to define governance. It can improve internal discipline, decision rights, and accountability.

But your critique is valid: a management system standard does not guarantee a vendor built the right technical controls. Governance tells you there is a steering wheel. It does not prove the brakes work.

So use it the same way you use ISO 27001: a maturity signal, not a safety guarantee.

Failure Modes You Should Plan For Even With “Good” Vendors

A realistic risk posture accepts that some failures will happen. Your goal is to reduce probability and limit blast radius.

Plan for:

A novel prompt injection that bypasses known patterns.
A connector misconfiguration that exposes sensitive data.
A RAG update that introduces toxic or malicious instructions.
A model update that changes refusal behavior or tool-use behavior.
A logging or analytics pipeline that unintentionally stores sensitive prompts or outputs.

Your minimum requirement should be:

Rapid disablement of tools and connectors
Forced “read-only mode”
Immutable audit logs
Clear customer notification thresholds
Contractual language tied to evidence and response SLAs

The Closing Question: What Would You Need to Trust It?

A defensible answer is:

You should trust an AI vendor with your data, customers, or money only when they can demonstrate, with evidence and repeatable tests, that:

Context access is explicitly bounded and tenant-isolated.
Tool-based actions are authorized through system-enforced gates, not conversation.
Prompt injection and indirect injection are continuously tested, with measurable bypass rates and improvement trends.
Model and data supply chains are provenance-controlled, with rollbacks and regression testing.
Monitoring detects drift and unsafe behavior, and the vendor can fail closed quickly.

That is what AI-specific control assurance needs to look like before the badges mean what buyers think they mean.

SOC 2 and ISO 27001 Still Matter, But They Cannot Assure AI Safety

VERDICT MODE engaged

Why AI Systems Fail Differently

Standard Nomenclature: If You Cannot Name It, You Cannot Control It

What SOC 2 and ISO 27001 Actually Tell You About AI

What AI-Specific Control Assurance Should Include

1) Model and Data Provenance Controls

2) Explicit Context Boundaries and Tenant Isolation

3) Tool and Action Authorization That Is Not Conversational

4) Prompt Injection and Indirect Injection Defenses

5) Output Monitoring, Drift Detection, and Safety Telemetry

6) Assurance Metrics That Matter

A Practical Vendor Assessment: The “AI Assurance Packet”

Where ISO 42001 Fits, And Where It Does Not

Failure Modes You Should Plan For Even With “Good” Vendors

The Closing Question: What Would You Need to Trust It?

Reply

Keep Reading

The Northern Signal