SR 11-7 — the Federal Reserve's supervisory guidance on model risk management, issued jointly with the OCC — remains the foundational framework for model governance at US financial institutions. Written in 2011 for an environment dominated by traditional statistical models: logistic regression credit scorecards, market risk VaR models, liquidity stress testing frameworks. The guidance establishes principles for model development, validation, and governance that have served the industry well for over a decade.
The deployment of large language models in banking workflows creates a specific problem: SR 11-7's framework assumes a model architecture that LLMs do not fit. The guidance's core concepts — defined inputs and outputs, statistical performance metrics, backtesting against historical data, parameter stability — were engineered for models with bounded, interpretable behavior. LLMs are not bounded in the same sense, are not interpretable in the same sense, and do not backtest in the same sense.
This does not mean SR 11-7 does not apply to LLMs. It applies fully. It means that the practical implementation of SR 11-7's requirements for LLM deployments requires architectural choices that the guidance's authors did not anticipate. This article identifies the five principal gaps and explains how a pre-execution enforcement layer closes each one.
Why SR 11-7 Was Written for Statistical Models
SR 11-7 defines a "model" as "a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates." The definition continues: the model "consists of three components: an information input component, which delivers assumptions and data to the model; a processing component, which transforms inputs into estimates; and a reporting component, which translates the estimates into useful business information."
This three-component definition maps cleanly onto traditional quantitative models. A credit scorecard takes structured application data as input, applies a logistic regression or similar transformation, and produces a numerical score. The score can be backtested against historical outcomes. The model's parameters can be examined. The validation team can construct a challenger model, compare performance statistics, and make a defensible determination about whether the production model performs adequately.
An LLM does not fit this architecture. Its "inputs" may be unstructured natural language that varies in length, format, and semantic content. Its "processing component" is a multi-billion parameter neural network whose internal computations are not interpretable in a meaningful sense. Its "outputs" are probability distributions over tokens — which are then sampled to produce text — not deterministic numerical estimates.
The practical implication is that the standard validation toolkit — statistical performance metrics, backtesting, parameter stability analysis, challenger model comparison — cannot be applied to LLMs in the same way. Model risk managers who attempt to apply SR 11-7 as written to LLM deployments will encounter five specific gaps.
The Five Gaps: SR 11-7 Applied to LLMs
Gap Analysis Summary Table
| SR 11-7 Requirement | Gap for LLMs | Enforcement Layer Coverage | Residual Risk |
|---|---|---|---|
| Conceptual soundness validation | HIGH — No access to model internals | Validate enforcement layer rules (fully testable) | MEDIUM — Vendor due diligence required |
| Outcome analysis / backtesting | MEDIUM — Causal chain ambiguity | Per-decision audit record enables outcome linkage | LOW — With consistent audit record collection |
| Model stability monitoring | HIGH — Provider may update model silently | Policy trigger rate monitoring detects behavioral changes | MEDIUM — Not equivalent to parameter monitoring |
| Independent validation | HIGH — Cannot validate closed-source model | Full validation of enforcement layer is possible | MEDIUM — Foundation model provenance risk remains |
| Audit trail / documentation | MEDIUM — Raw transcripts lack governance context | HMAC-signed certificates provide governance audit records | LOW — Certificate chain fully satisfies documentation |
Model Validation Documentation for LLM Deployments
For institutions that must produce SR 11-7-compliant model validation documentation for LLM deployments, the enforcement layer architecture changes what the documentation needs to demonstrate. Rather than documenting traditional statistical validation results, the validation package for an LLM deployment governed by an enforcement layer should include:
- Use case specification. A precise description of the business function the LLM performs, the data it accesses, the outputs it produces, and the human decisions it informs. The validation team needs to understand the intended purpose to evaluate whether the governance controls are adequate for that purpose.
- Policy pack documentation. A complete description of the enforcement layer's policy rules, including the regulatory requirements each rule satisfies, the logic of each rule, and the test cases used to validate each rule. This is the conceptual soundness documentation for the governance architecture.
- Policy test coverage report. Documentation of the test harness used to validate policy behavior, including the test cases, the expected results, the actual results, and the coverage metrics. Regulators will ask for evidence that policy rules were tested before deployment.
- Behavioral monitoring plan. A specification of the metrics monitored to detect behavioral drift in the foundation model, the thresholds at which alerts are generated, and the escalation process when thresholds are crossed. This is the substitute for traditional model stability monitoring.
- Vendor due diligence package. Documentation of the institution's assessment of the foundation model provider's practices, including training data practices, safety evaluation procedures, change notification protocols, and BAA or equivalent agreements as applicable.
- Limitations and restrictions. A clear statement of what the LLM may and may not do within the institution's workflows, including the specific functions governed by the enforcement layer and any functions excluded from the governance scope.
The Examination-Ready Posture
Federal Reserve and OCC examiners examining model risk management programs at institutions using LLMs will be looking for evidence that the institution understands the limitations of SR 11-7 applied to LLMs and has developed compensating controls where traditional methods fall short.
The compensating control argument that resonates with examiners is the enforcement layer architecture: because we cannot fully validate the foundation model's internal behavior, we have built a governance layer that constrains what the model can do before its outputs are used in any consequential decision. The governance layer is fully deterministic, fully testable, and fully documented. Every consequential decision informed by the LLM carries a signed certificate documenting the governance evaluation.
This is not a claim that LLMs have been validated in the traditional sense. It is a claim that the institution has implemented risk management controls that satisfy SR 11-7's substantive requirements — risk reduction, outcome documentation, ongoing monitoring, and audit trails — through mechanisms appropriate to LLM technology.
Examiners who are technically sophisticated will recognize the architectural soundness of this approach. Examiners who are not will typically accept it when accompanied by comprehensive documentation and evidence of actual enforcement. The enforcement layer's signed certificate archive is the evidence of actual enforcement — a concrete demonstration that governance was applied to every decision, not just to those reviewed by a compliance analyst.
For related regulatory guidance, see our coverage of CFPB AI lending requirements, EU AI Act Article 9 enforcement requirements, and the full SR 11-7 enforcement layer analysis. For technical integration details of the CoreGuard enforcement platform, including the lending_v1 policy pack that addresses ECOA, Reg B, and HMDA requirements, see the documentation.