← Back to Blog
Engineering · Performance · Scale

Deterministic Governance at Scale: The Engineering of Zero-LLM Enforcement

EVE Engineering May 19, 2026 8 min read
Deterministic Governance at Scale: The Engineering of Zero-LLM Enforcement

The performance objection to rigorous AI governance is common and often decisive: “If we run every request through a comprehensive governance evaluation, the added latency will make the system unusable.” This objection is based on a mistaken assumption — that comprehensive governance evaluation requires calling an LLM. It does not. The engineering of zero-LLM deterministic enforcement achieves sub-millisecond governance evaluation at arbitrary request throughput. The architecture is well-understood; the implementation is not novel; the performance characteristics are predictable. The objection dissolves when the architecture is specified.

0.85ms Max per-request governance latency
2000ms LLM-based governance latency
0 LLM calls in enforcement path

Why LLM-Based Governance Is Slow

A governance system that evaluates compliance by calling an LLM adds 200–2000ms of latency per evaluated request, depending on the model, the infrastructure, and the evaluation prompt complexity. This latency comes from several sources:

  • Network round-trip: API calls to inference endpoints add 50–200ms of network latency even under optimal conditions.
  • Token generation: LLM evaluation requires generating tokens to express the governance verdict. Generation latency scales with output length and model size — each token adds latency.
  • Non-determinism overhead: Because LLM outputs vary, LLM-based governance systems often run multiple evaluation passes and aggregate results to reduce variance. Two or three evaluation passes multiply the base latency.
  • Context construction: The governance prompt must include the policy specification, the input to be evaluated, and potentially conversation history. Constructing and tokenizing this context adds overhead before inference begins.

At 200ms per evaluation, a service handling 1,000 requests per second must process 200 governance evaluations per second. The governance layer becomes the bottleneck — and the latency added to each request degrades user experience across the entire request population.

The Zero-LLM Architecture

Deterministic governance at scale requires that no LLM is called in the governance evaluation path. Every component of the evaluation must be computable by a deterministic function with predictable, bounded execution time. The architecture has three stages:

Stage 1: Input Normalization (0.04–0.1ms)

Before evaluation, every input is normalized to a canonical form: Unicode NFKC normalization, whitespace canonicalization, case normalization for pattern matching, and encoding normalization to UTF-8. Normalization ensures that semantically equivalent inputs produce identical evaluations. Without normalization, adversarial inputs can bypass pattern matching by using homoglyphs, zero-width characters, or alternate encodings. With normalization, all surface variations collapse to the same canonical form before evaluation. Normalization is O(n) in input length — for typical AI request lengths (100–2,000 characters), it completes in 0.04–0.1ms.

Stage 2: Action Classification (0.1–0.3ms)

The normalized input is classified into action types using a precompiled deterministic finite automaton (DFA). The DFA is compiled at rule set load time from the action type specifications. During evaluation, the DFA processes the input in a single left-to-right pass and outputs the set of matching action types. DFA execution is O(n) in input length and independent of the number of rules. A DFA compiled from 100 action type patterns executes in the same time as a DFA compiled from 3 rules — the compilation is more complex, but the execution time is identical.

DFA compilation happens at startup (50–500ms for large rule sets). The compiled DFA is an in-memory data structure. During request processing, the DFA lookup is a table traversal — no string comparison, no regex backtracking, no LLM calls.

Stage 3: Policy Evaluation (0.05–0.2ms)

The classified action types are evaluated against the policy function. The policy function is a precompiled lookup table: action type → verdict. The lookup is O(1): constant-time table access, no computation beyond the lookup. When multiple action types are classified, the combiner function applies the precedence hierarchy (BLOCK > MODIFY > ALLOW) across the classified action types’ verdicts.

Total Evaluation Latency

Summing the stages:

Stage Latency Range
Input normalization0.04–0.10ms
Action classification (DFA)0.10–0.30ms
Policy evaluation (lookup)0.05–0.20ms
Verdict signing (HMAC-SHA256)0.10–0.15ms
Record assembly and chain linking0.05–0.10ms
Total0.34–0.85ms

At 1,000 requests per second, 0.85ms governance evaluation adds effectively zero amortized latency per request when evaluated concurrently with LLM inference at 200–2000ms. The governance layer is not the bottleneck.

Scaling to Arbitrary Throughput

The zero-LLM governance architecture scales horizontally without architectural changes. Because each evaluation is stateless — given the same input and the same compiled rule set, the evaluation produces the same output — governance evaluators can be replicated across any number of nodes. No coordination is required between evaluators for individual request processing.

Coordination is required only for two events:

  • Rule set updates: When the rule set changes, all evaluators must atomically switch to the new compiled DFA and policy table. This is achieved by distributing the new compiled rule set to all nodes and signaling a synchronized switch.
  • Chain linking: Each governance record must link to the previous record. In a distributed setting, this requires a coordination mechanism to ensure correct ordering. Single-writer-per-tenant with durable tail hash is the recommended approach: each tenant’s governance chain has one designated writer at a time, and the writer persists the tail hash before each append.

Neither coordination requirement imposes latency on individual request evaluations. Rule set updates happen at plan change events (rare). Chain linking uses a local tail hash that is read and written by a single writer without network coordination.

Memory Mapping for Large Rule Sets

For governance deployments with large rule sets (100+ action type patterns, complex multi-dimensional policy tables), the compiled DFA and policy tables may be large enough that memory management matters. The recommended approach is memory-mapped file storage: the compiled rule set is stored as a memory-mapped file, and the operating system manages paging. For governance workloads where a small number of action types account for the majority of request classifications, the hot portion of the DFA fits in L3 cache, and evaluations for the common case complete in the low end of the latency range (0.3–0.4ms).

Memory-mapping also enables atomic rule set updates: the new compiled rule set is written to a new file, and the mapping is switched atomically. No request processes against a partially updated rule set.

The Signing Overhead

Cryptographic signing of governance records adds overhead. HMAC-SHA256 over a typical governance record requires approximately 0.1–0.15ms on modern hardware. For high-throughput deployments, signing can be batched: individual governance records are assembled without signatures and queued; a signing thread processes the queue in batches, adding signatures and chain links asynchronously. The signed records are available for audit purposes within 5–10ms of the original evaluation — adequate for all audit and compliance use cases while adding no synchronous latency to request processing.

The Performance Case for Deterministic Governance

Deterministic zero-LLM governance is not a performance compromise. It is a performance improvement over LLM-based governance, combined with determinism, replay capability, and cryptographic auditability that LLM-based governance cannot provide.

Metric LLM-Based Governance Zero-LLM Deterministic
Per-request latency 200–2,000ms 0.3–0.85ms
Throughput scaling LLM inference bottleneck Linear with CPU cores
Non-determinism Present (temperature, batching) None
Replay capability Not guaranteed Guaranteed
Audit record verifiability Requires live system Offline, independent

The engineering investment to build and maintain a compiled rule set, action type taxonomy, and DFA-based evaluation engine is real. It is offset by: eliminated LLM API costs in the governance path, eliminated latency overhead in the enforcement path, and the compliance and audit capabilities that only deterministic governance provides.

Deterministic Governance Zero-LLM Enforcement DFA Pattern Matching Governance Performance Sub-Millisecond Enforcement Precompiled Rule Sets Scale Architecture