
ContractKen started building contract AI before ChatGPT launched. The system has gone through four distinct phases, each adding a layer of capability on top of the last.
A single contract review triggers a coordinated sequence of specialized models and processing steps. Here is what happens under the hood.
[PARTY_A], monetary values become [AMOUNT], proprietary terms become [TERM_1]. A mapping table is maintained so originals can be restored in the output. The raw text never leaves the client environment unprotected.ContractKen uses DeBERTa-based Natural Language Inference to determine whether a contract clause complies with, deviates from, or contradicts a playbook standard. This is fundamentally different from keyword matching.
Natural Language Inference classifies the relationship between two text segments as entailment (A supports B), contradiction (A conflicts with B), or neutral (no clear relationship).
For contract review, the premise is the playbook standard and the hypothesis is the contract clause. The model determines whether the clause satisfies, violates, or partially addresses the standard.
This is critical because contracts express the same concepts in vastly different language. A limitation of liability might say "aggregate liability shall not exceed" or "total exposure is capped at" or "cumulative damages are limited to" - all expressing the same idea. Keyword matching fails here. NLI understands the semantic relationship.
ContractKen uses DeBERTa (Decoding-enhanced BERT with disentangled attention) for NLI because its disentangled attention mechanism handles long, complex legal sentences more effectively than standard BERT. The models are fine-tuned on legal corpora and evaluated using domain-specific benchmarks.
Each clause checked against preferred, fallback, and walkaway positions using entailment/contradiction scoring.
Identifying clause types across 100+ categories, even when the language is non-standard or jurisdiction-specific.
Determining which required clause types are absent from a contract by checking the full document against the playbook's required provisions.
Standard Retrieval-Augmented Generation fails for contracts because it ignores document structure, cross-references, and the dependency relationships between clauses. ContractKen's RAG is built for legal documents specifically.
preferred = breach + IP + misconduct + fees, fallback = breach + misconduct, walkaway = negligence + misconductConfidential contract text is masked before it reaches any external AI model. This is an architectural control, enforced at the system level.
The Moderation Layer sits between the contract text and the AI processing layer. It intercepts outbound text, identifies confidential entities using the NER models from the extraction pipeline, applies configurable masking rules, and maintains a mapping table for de-masking the output.
Different tasks have different requirements. Clause classification needs precision. Entity extraction needs speed. Risk analysis needs reasoning. ContractKen routes each task to the model best suited for it.
| Task | Model Type | Optimized For | Why This Model |
|---|---|---|---|
| Clause Classification | DeBERTa (NLI) | Precision | Clause type identification requires high-precision classification across 100+ categories. DeBERTa's disentangled attention handles long legal sentences where standard BERT struggles. |
| Entity Extraction | spaCy + Custom NER | Speed + Coverage | Entity extraction runs on every sentence in the document. It needs to be fast and comprehensive. spaCy's pipeline architecture with custom legal entity types provides both. |
| Playbook Compliance | DeBERTa (NLI) Scoring Rules | Accuracy | NLI determines entailment/contradiction against each playbook position. Scoring rules map NLI output to severity levels (above preferred, below preferred, below fallback, below walkaway). |
| Risk Analysis & Explanation | LLM (Extended Reasoning) | Reasoning Depth | Explaining why a clause is risky and how to mitigate it requires multi-step reasoning. Extended-reasoning LLMs handle the nuance of "this clause creates risk because of its interaction with Section 4 and the definition of 'Material Adverse Change' in Section 1.2." |
| Redline Generation | LLM with RAG | Quality + Source Fidelity | Redline text is generated by the LLM but constrained by RAG-retrieved clause library language. The model selects from existing approved language rather than inventing new phrasing. |
| Formatting & Proofing | Rule-based ML Hybrid | Determinism | Defined term consistency, cross-reference validation, and numbering checks require deterministic correctness. Rule-based checks handle structural validation; ML handles semantic checks (is this term being used consistently across contexts?). |
| Document Summarization | LLM (Fast) | Speed | Contract summaries need to be generated quickly for preview and triage. Faster LLMs handle summarization while heavier models are reserved for detailed analysis. |
A single LLM cannot be simultaneously optimized for speed (entity extraction on thousands of sentences), precision (clause classification across 100+ types), reasoning depth (multi-clause risk analysis), and determinism (formatting validation). Routing tasks to specialized models means each component operates at peak performance for its specific job.
The routing architecture is model-agnostic at each decision point. When a new model outperforms the current one on a specific task, it can be swapped in without rebuilding the pipeline. This is how ContractKen has evolved through four generations of AI in three years - the architecture stays stable while individual components improve.
ContractKen's models improve over time through domain-specific fine-tuning on legal corpora and a structured feedback loop that incorporates lawyer corrections into future model behavior.
Pre-trained transformer models (BERT, DeBERTa) have broad language understanding but lack the precision needed for legal clause classification out of the box. Fine-tuning retrains the model's upper layers on labeled legal data - thousands of annotated clause examples across 100+ categories - while preserving the foundational language understanding in the lower layers.
The result: a model that understands general language structure AND recognizes that "the aggregate exposure of the service provider" means "vendor liability cap" in a contract context.
For organization-specific adaptation, fine-tuning can incorporate a firm's own contract corpus. A firm that negotiates IP-heavy technology agreements will have different clause patterns than one focused on commercial real estate leases. The fine-tuned model reflects these domain-specific patterns.
Every time a lawyer accepts, modifies, or rejects a ContractKen suggestion, that decision becomes a training signal. Accepted suggestions validate the model's judgment. Modified suggestions show where the model was directionally correct but needed refinement. Rejected suggestions identify areas where the model's reasoning diverged from the lawyer's expertise.
These signals are aggregated, reviewed, and periodically incorporated into model updates through supervised fine-tuning. The system does not retrain in real-time on individual interactions - it accumulates feedback and retrains in controlled cycles with human review of the training data.
Clause classifications, risk assessments, and redline suggestions produced for a contract review.
Accepts, modifies, or rejects each suggestion. These actions are logged as feedback signals.
Signals collected across reviews. Patterns identified: which clause types have high accept rates? Where does the model consistently need correction?
Curated training data (with human review) used to fine-tune models in controlled cycles. No automatic retraining on raw user data.
Updated models evaluated against held-out test sets (F1, precision, recall) before deployment. Performance must improve or remain stable on all benchmarks.
Legal AI handles some of the most sensitive documents in an organization. The architecture reflects that responsibility at every level.
Confidential information is protected by architectural controls at the system level.
Enterprise-grade security infrastructure with certification and compliance.
Designed for regulated industries and attorney-client privilege requirements.
ContractKen generates suggestions. The lawyer accepts, modifies, or rejects each one. Nothing is applied to the document without explicit human approval. Every tracked change and comment can be reviewed before the contract goes to the counterparty. AI assists the judgment - it does not replace it.
ContractKen uses a compound AI system for contract review and drafting inside Microsoft Word. The pipeline includes DeBERTa-based Natural Language Inference for clause classification and playbook compliance, spaCy NER for entity extraction, Retrieval-Augmented Generation grounded against clause libraries and playbooks, and large language model integration for analysis and redline generation. The Moderation Layer masks confidential information before AI processing using NER, regex patterns, and customer-configurable dictionaries. The system has evolved through four phases since 2022, progressing from fine-tuned BERT models to a multi-model orchestration architecture with task-specific routing.