A different kind of question

Most regulatory checks are precise, and that precision is what makes them safe to automate. Did the number exceed the limit. Did the filing land inside the window. Was the prohibited party on the list. Each of these has an answer that does not depend on interpretation. You can compute it, show your work, and defend it line by line.

Contract review is not like that. It asks a question with a soft edge. Does this clause say what it should? Is there a limitation-of-liability provision in here at all, and if there is, does its cap sit below the position we are willing to accept? You cannot answer that with a comparison operator. You have to read the clause, understand what it means, and judge it against a standard. That is interpretation, and it is the one place in this whole engine where a language model is not a convenience layered on top but the actual core of the work.

This is where most AI products go wrong, and they go wrong in a predictable way. The instinct is to let the model run the entire show. Feed it the contract, let it read every clause, let it form a view, and return that view as the answer. It demos beautifully. It is also exactly the failure the rest of this engine exists to prevent. A model's confident "this indemnity looks fine" is worth precisely nothing in front of a court, because the model cannot be cross-examined, cannot cite its reasoning to a standard, and is wrong often enough that "the AI said so" is not a defense anyone wants to mount.

So legal becomes the proving ground for a specific and unfashionable discipline. Use the model for the part that is genuinely semantic, and for nothing else. Draw a hard line around the model's job, and do not let it across that line no matter how capable it seems.

What is data, and what is a rule

The reference layer in legal is the playbook. For each kind of clause, it holds the standard positions: the wording you prefer, the fallback you will accept if pushed, and the point past which you walk away from the deal. A mature playbook is thousands of positions across dozens of clause types, and in a real deployment it is the customer's own, which makes it the most valuable and least commoditized asset in the entire domain. Two firms reviewing the same contract reach different verdicts because their playbooks differ, and that difference is their actual legal judgment, encoded. It is maintained reference data, and a lawyer signs off on every position in it.

The rules are the obligations that sit on top of the playbook. A clause of a required type must be present. A clause must not fall below the playbook's fallback position. A context that triggers a mandatory term must actually contain that term. Some of these obligations are mechanical and run deterministically with no model involved at all. Whether a defined term is used consistently throughout the document. Whether every internal cross-reference resolves to a section that exists. Those are precise checks, and precise checks do not need a language model.

But the central obligation, the one that makes legal a hard domain, is whether a clause deviates materially from the standard. That is irreducibly semantic, and so it is handed to the model. Explicitly. On the record. With a flag that says so.

{
  "rule_id": "LEG-MAND-EU-003",
  "title": "Processor contract missing GDPR Article 28(3) terms",
  "jurisdiction": "eu",
  "source": "GDPR Article 28(3)",
  "severity": "block",
  "expected_outcome": {
    "action": "review",
    "message": "The agreement involves personal-data processing but lacks the mandatory Article 28(3) processor terms. Add a compliant data-processing clause before execution."
  },
  "conditions": [
    { "type": "context_flag", "path": "contract.involves_personal_data", "equals": true },
    { "type": "clause_present", "clause_type": "art28_processor_terms", "expect": "absent", "requires_llm": true }
  ],
  "deterministic": false,
  "requires_llm": true,
  "validation_status": "expert_reviewed"
}

Read the structure carefully, because the honesty is built into it. A deterministic gate comes first: does this contract touch personal data at all? That is a precise check, true or false, no interpretation. Only if that gate passes does the semantic check run, and that semantic check is openly marked requires_llm. The whole rule is flagged deterministic: false. Nothing here pretends to a certainty it does not have. The rule announces, in its own fields, exactly which part of itself is a judgment call.

The leash

Here is the single rule that keeps the model honest, and it is worth stating plainly because everything depends on it: a low-confidence semantic judgment is a preview, never a verdict.

When the model reads a clause and is confident, fine. When it reads a clause and is uncertain, that uncertainty does not get rounded up into a decision. It gets surfaced to a human, marked as unconfirmed, and held there until a person rules on it. The model is allowed to raise its hand and say "I think this clause might be missing, but I am not sure." It is not allowed to quietly file that hunch as a finding.

This is the same fail-closed discipline that governs the rest of the engine, applied to a new kind of uncertainty. Elsewhere, a missing reference row makes the engine say "I could not verify this." Here, an uncertain clause read makes it say the same thing. When the engine is not sure whether a clause is present or compliant, it does not guess in the optimistic direction or the pessimistic one. It declares the uncertainty and routes the case to a person. The model reads. The human decides what the reading means.

The point

Legal is the domain people assume an engine built on deterministic rules simply cannot handle, because its core is interpretation and interpretation is the thing rules are supposed to avoid. The assumption is wrong, and it is wrong for an instructive reason. The engine handles legal precisely because it refuses to let interpretation become the verdict. The model reads the clause. The playbook says what the standard is. The rule says what the obligation is. And a human, looking at all three, decides. The model's job is to read, and only to read.

That line, between reading and deciding, is the whole game. Every other domain draws it too, but legal is where it is drawn most sharply, because legal is where the temptation to erase it is strongest. Resist that temptation and the hardest domain becomes tractable. Give in to it and you have built something that demos well and cannot be trusted with a single real contract.