The first question engineers ask when we describe our governance gate is: how does it stay under a millisecond? The second question is: why does that matter? The answer to the second question explains the answer to the first. Governance that adds meaningful latency to every request will be removed from the hot path. Governance that lives outside the hot path is not governance — it is auditing after the fact. If you want enforcement, you need enforcement latency that production systems can absorb. Sub-millisecond is the target because it makes avoidance economically irrational.
The Architecture Decision That Changes Everything
The most important design decision in building a deterministic governance runtime is also the most constraining: no LLM calls in the governance plane. This is not a performance optimization. It is an architectural requirement. The moment you introduce an LLM call into the governance evaluation path, you have introduced three fundamental problems.
- Non-reproducibility. LLM outputs vary across versions, batching configurations, numerical precision modes, and infrastructure environments. A governance verdict produced by an LLM call cannot be deterministically replayed.
- Latency unpredictability. LLM inference time varies with input length, system load, provider availability, and model version. You cannot bound it reliably for a hot-path gate.
- Adversarial surface. An LLM that evaluates inputs can be prompted. Any component of your governance system that interprets natural language can be manipulated through natural language. The governance plane must be immune to this class of attack.
Eliminating LLM calls from the governance plane means the entire evaluation must run on compiled logic, pattern matching, arithmetic comparison, and cryptographic primitives. This is the constraint that drives every subsequent design decision.
Precompiled Rule Sets
The governance rules — charter principles, policy thresholds, domain-specific constraints — are not evaluated at request time by interpreting a rule document. They are compiled at deployment time into evaluation structures that the runtime can execute directly.
Concretely: a rule like "deny any request where the role claim is VIEWER and the requested action class is WRITE" becomes a pair of field extractors and a comparison, not a natural language string that gets semantically evaluated. The evaluation is O(1) per rule. A set of 14 charter rules plus domain-specific policy thresholds evaluates in microseconds, not milliseconds.
The compilation step also serves as a validation step. A rule that cannot be compiled into a deterministic evaluator is a rule that cannot be enforced deterministically. The compilation boundary forces governance engineers to express rules in terms that can be mechanically applied, not terms that require interpretation.
Pattern Matching at the Hot Path
For certain rule classes — injection detection, prohibited content patterns, authority claim extraction — we use precompiled finite automata. The patterns are compiled from their specification into DFA form at deployment time. A DFA can scan a request of several thousand tokens in a single pass without backtracking, with CPU-cache-friendly memory access patterns.
DFA compilation has a well-understood tradeoff: large rule sets can produce large automata with significant state space. We manage this through rule stratification — the highest-priority rules form the innermost DFA and are always evaluated first.
The compilation artifacts are deterministic. Given the same rule set and compiler version, the output DFA is always identical. This means the governance behavior is reproducible from the specification alone, independently of the running system.
Replay Determinism
Every evaluation in the governance plane produces a signed record. The record contains: the input canonicalized and hashed using JCS (RFC 8785 JSON Canonicalization Scheme), the rule set version and hash, the verdict (ALLOW / MODIFY / BLOCK), the specific rules that triggered if any, a timestamp, and an HMAC-SHA256 signature over all the above fields.
A second system with the same rule set and signing key can take any of these records and verify the verdict by re-running the evaluation against the stored input hash. It does not need access to the original system. It does not need to call any API. The replay is purely local computation.
This property — offline replay without system dependency — is what makes the governance record legally meaningful rather than just operationally useful. An auditor can verify a decision from two years ago without trusting that the infrastructure is in the same state it was when the decision was made.
Latency Budget
In production, our governance gate adds between 0.3ms and 0.8ms to the request latency. The variance is primarily driven by request size (more tokens to scan) and cache state (cold-start on the first request after a deployment).
To put this in context: a call to an external LLM for evaluation purposes adds between 200ms and 2000ms depending on the provider, model, and queue depth. Our gate is 250x to 2500x faster, without the reproducibility or adversarial surface problems. The latency budget also means we can afford to run the governance evaluation synchronously in the request path, before the LLM call. There is no need for an async governance check that races with the model call. The governance verdict is available before the model ever sees the input.
What This Enables That Probabilistic Approaches Cannot
A governance system that runs deterministically, before the LLM, at sub-millisecond latency enables properties that probabilistic post-hoc approaches cannot provide:
- True pre-execution veto. The model never sees prohibited input. Post-hoc filtering can suppress the output; pre-execution enforcement prevents the computation entirely.
- Consistent enforcement under load. Deterministic rule evaluation is not affected by system load, model queue depth, or provider availability. The governance verdict for a given input is always the same, independent of what else the system is doing.
- Auditable gate, not auditable output. The record of governance enforcement is the record of what the gate evaluated — the input and the verdict — not a record of what the model produced. Gate records are more legally meaningful than output records because they capture the enforcement point.
- Adversarial immunity in the governance plane. An attacker who can manipulate the model's output cannot manipulate the governance verdict, because the governance verdict was produced by a system that never read the model's output.
The engineering challenge of building a deterministic governance runtime is real. The compilation pipeline, the DFA construction, the canonicalization scheme, the signing infrastructure — each of these requires careful implementation and ongoing maintenance. But the result is a governance component with properties that LLM-based approaches cannot match: deterministic, replayable, sub-millisecond, adversarially immune. That is what enforcement looks like as infrastructure, not as a feature.