How long has ContractKen been developing contract AI?

ContractKen has been building contract AI since 2022, through four phases. Phase 1 (2022-2023) built the NLP foundation with BERT, KNN clustering, and NER. Phase 2 (2023-2024) added DeBERTa for NLI-based compliance checking and multi-model routing. Phase 3 (2024-2025) integrated LLMs with RAG grounding and the Moderation Layer. Phase 4 (2025-present) orchestrates the full compound AI system with playbook enforcement and analytics.

How is ContractKen different from using ChatGPT for contract review?

ChatGPT is a single general-purpose LLM. ContractKen is a compound AI system with multiple specialized models in a coordinated pipeline. Clause classification uses fine-tuned DeBERTa with Natural Language Inference. Entity extraction uses spaCy NER. Risk scoring uses playbook-specific classifiers. The LLM is one component in an 8-step pipeline, constrained by RAG context and the Moderation Layer.

What AI models does ContractKen use?

ContractKen uses DeBERTa fine-tuned on legal corpora for clause classification and playbook compliance via Natural Language Inference, spaCy with custom entity types for Named Entity Recognition, large language models for analysis and redline generation grounded by RAG context, and rule-based systems for formatting validation. A routing layer selects the optimal model for each task.

How does ContractKen handle AI hallucinations?

Three controls: the LLM is constrained by RAG retrieval and required to cite sources from clause libraries and playbooks; upstream NLI classification provides independent clause identification; and human-in-the-loop design ensures every suggestion is reviewed by the lawyer before application.

Can the AI be customized for my organization?

Yes, at multiple levels. Playbooks codify your negotiation positions. The clause library includes your proprietary language alongside 700+ pre-built clauses. RAG retrieves from your precedents and standards. Moderation Layer anonymization is configurable per organization. Enterprise fine-tuning on your contract corpus is available.

What happens to contract data during AI processing?

Contract text passes through the Moderation Layer before reaching any external AI model. Confidential entities are masked using NER and configurable rules. No customer data is used for model training. Data is encrypted at rest (AES-256) and in transit (TLS 1.2+). SOC 2 Type II certified.

What is Natural Language Inference and why does it matter for contracts?

NLI determines the logical relationship between two text segments: entailment, contradiction, or neutral. In contract review, NLI compares each clause against playbook standards to determine compliance. This is more reliable than keyword matching because contracts express the same concepts in different language across jurisdictions and drafting styles.

How has ContractKen's AI evolved over time?

ContractKen has built contract AI since 2022 through four phases: NLP foundation with BERT and NER (2022-2023), NLI-based compliance with DeBERTa and multi-model routing (2023-2024), LLM integration with RAG and the Moderation Layer (2024-2025), and full compound AI orchestration with playbook enforcement and precedent-based drafting (2025-present).

AI for Contract Drafting & Review

36+ Months of Development

How ContractKen's AI Has Evolved

ContractKen started building contract AI before ChatGPT launched. The system has gone through four distinct phases, each adding a layer of capability on top of the last.

Phase 1: 2022 - 2023

Foundation - Pattern Recognition & Classification

The initial system focused on teaching machines to identify and classify contract clauses. We fine-tuned BERT-based models using a SQuAD-style question-answering formulation: "Where is the arbitration clause?" and "How similar is this indemnification language to our benchmark?" K-Nearest Neighbors clustering handled standard clause recognition across large contract sets. Named Entity Recognition (spaCy + custom models) extracted parties, dates, monetary values, and defined terms.

BERT (fine-tuned) SQuAD Q&A KNN Clustering NER (spaCy) Entity Extraction

Phase 2: 2023 - 2024

Intelligence - NLI, Fine-Tuning & Multi-Model Architecture

The system moved from pattern matching to reasoning. We adopted DeBERTa for Natural Language Inference (NLI) - the ability to determine whether a contract clause entails, contradicts, or is neutral relative to a playbook standard. This became the backbone of playbook compliance checking. The architecture evolved into a multi-model system with task-specific routing: clause classification routed to DeBERTa, entity extraction to the NER pipeline, risk scoring to specialized classifiers. Each model was fine-tuned on legal corpora, evaluated on domain-specific benchmarks (F1, precision, recall).

DeBERTa (NLI) Fine-tuning Model Routing Task-Specific Classifiers Domain Evaluation

Phase 3: 2024 - 2025

Generation - LLM Integration with RAG & the Moderation Layer

Large language models added the ability to explain issues, suggest mitigations, and generate redline text. But feeding raw contract text to LLMs was a non-starter for legal confidentiality. We built the Moderation Layer - an architectural privacy control that masks confidential information (party names, deal values, proprietary terms) before any text reaches an LLM. Retrieval-Augmented Generation (RAG) grounded every LLM output against the organization's clause library, playbooks, and precedents. The AI stopped hallucinating because it was forced to cite its sources.

LLM Integration RAG Moderation Layer Semantic Chunking Source Attribution

Phase 4: 2025 - Present

Orchestration - The Compound AI System

Today, a single contract review triggers a coordinated pipeline of specialized models. The system parses the document structure, segments clauses semantically, classifies each clause via NLI, extracts entities, scores risks against playbook positions, retrieves relevant knowledge (clauses, precedents, standards), generates analysis and redlines via LLM, and post-processes everything into clean Word tracked changes. Each step uses the right model for the job. This is a compound AI system with multiple specialized components - the opposite of a single LLM call.

Compound AI System Pipeline Orchestration Playbook Enforcement Precedent-Based Drafting Analytics & Benchmarking

What Happens When You Click "Review"

The Contract Review Pipeline

A single contract review triggers a coordinated sequence of specialized models and processing steps. Here is what happens under the hood.

Why this matters: A production-grade contract review system is a compound AI system with multiple specialized components working in sequence. A single LLM call cannot parse document structure, classify 100+ clause types, extract entities, check playbook compliance, retrieve relevant precedents, AND generate accurate redlines. Each step requires a different model optimized for a different task.

1

Document Parsing & Structure Extraction Rule-based + ML

The contract is parsed into its structural components: recitals, definitions, substantive provisions, general provisions, schedules, and signature blocks. Section numbering, heading hierarchy, and cross-reference targets are identified. This structural understanding is critical because a limitation of liability clause depends on definitions defined elsewhere in the document.

Handles DOCX, PDF, RTF, and image formats. OCR applied where needed. The parser respects document hierarchy rather than treating the contract as flat text.

2

Clause Segmentation & Classification DeBERTa NLI

Each provision is segmented into semantic clause units and classified using DeBERTa-based Natural Language Inference. The model determines the clause type (indemnification, limitation of liability, termination, IP assignment, etc.) across 100+ categories. Classification uses NLI rather than keyword matching - the model understands that "neither party shall be liable for incidental damages" is a consequential damages exclusion even though it never uses that phrase.

Fine-tuned on legal corpora. Evaluated on domain-specific benchmarks (F1, precision, recall). See the NLI Deep Dive below for how this works.

3

Entity Extraction (NER) spaCy + Custom Models

Named Entity Recognition identifies and extracts structured data from unstructured text: party names, dates, monetary values, defined terms, jurisdiction references, and regulatory citations. Custom entity types extend standard NER categories for legal-specific patterns (e.g., notice periods, renewal terms, cap multipliers).

Extracted entities feed into multiple downstream processes: the Moderation Layer uses them for anonymization, the risk assessment uses them for quantitative checks (e.g., "is this cap below our minimum?"), and the formatting checker uses them for consistency validation.

4

Risk Assessment & Playbook Scoring NLI + Scoring Models

Each classified clause is scored against the organization's playbook positions. The NLI model determines whether the clause language entails, contradicts, or is neutral relative to each playbook position (preferred, fallback, walkaway). Clauses below walkaway are flagged as high risk. Clauses between fallback and walkaway are medium risk. Missing clause types required by the playbook are identified through gap analysis.

Severity ranking is configurable per organization. A clause at "fallback" level may be acceptable for routine vendor agreements but flagged as high risk for high-value M&A transactions.

5

Moderation Layer (Privacy Gate) NER + Regex + Custom Rules

Before any text reaches an external LLM, the Moderation Layer intercepts it. Using the entities extracted in Step 3 plus configurable regex patterns and customer-defined dictionaries, confidential information is replaced with opaque tokens: party names become [PARTY_A], monetary values become [AMOUNT], proprietary terms become [TERM_1]. A mapping table is maintained so originals can be restored in the output. The raw text never leaves the client environment unprotected.

Organizations can configure which entity types to mask. The system supports custom dictionaries for trade names, project codes, and internal terminology. Full technical details on the Moderation Layer page.

6

Retrieval-Augmented Generation (RAG) Embeddings + Vector Search

The system retrieves relevant context from multiple knowledge sources before generating any analysis. For a flagged indemnification clause, RAG pulls: the organization's preferred indemnification language from the clause library (all 3 positions), the playbook guidance note for this clause type, relevant precedent language from prior deals, and industry benchmark data. This context is injected into the LLM prompt so every output is grounded in the organization's own standards.

ContractKen uses semantic section-aware chunking rather than fixed-size text splits. Each chunk carries metadata about its position in the contract hierarchy, related definitions, and cross-references. See the RAG Architecture section below.

7

Analysis & Redline Generation (LLM) LLM with RAG Context

With the anonymized clause text, risk scores, playbook positions, and retrieved context assembled, the LLM generates three outputs: (1) an explanation of why the clause was flagged, (2) a mitigation strategy referencing the playbook, and (3) specific redline text using language from the clause library. The model is constrained to cite its sources - every suggestion links back to a playbook position, a clause library entry, or a precedent.

Model routing directs different tasks to different LLMs based on the requirements. Extended-reasoning models handle complex multi-clause analysis. Faster models handle straightforward substitutions. The routing layer selects the optimal model per task.

8

Post-Processing & Word Integration Office.js + Formatting

The Moderation Layer's mapping table restores original party names and values in the output. Redlines are formatted as standard Word tracked changes using the Office.js API. Explanatory comments are inserted alongside each redline. The output is indistinguishable from manual edits - the counterparty sees normal tracked changes with no formatting artifacts or AI indicators.

The post-processor also handles defined term consistency, cross-reference validation, and numbering checks. Results appear in the ContractKen sidebar organized by severity (high risk first), with one-click navigation to each clause location in the document.

The Core Technique

Natural Language Inference for Clause Compliance

ContractKen uses DeBERTa-based Natural Language Inference to determine whether a contract clause complies with, deviates from, or contradicts a playbook standard. This is fundamentally different from keyword matching.

How NLI Works in Practice

Natural Language Inference classifies the relationship between two text segments as entailment (A supports B), contradiction (A conflicts with B), or neutral (no clear relationship).

For contract review, the premise is the playbook standard and the hypothesis is the contract clause. The model determines whether the clause satisfies, violates, or partially addresses the standard.

This is critical because contracts express the same concepts in vastly different language. A limitation of liability might say "aggregate liability shall not exceed" or "total exposure is capped at" or "cumulative damages are limited to" - all expressing the same idea. Keyword matching fails here. NLI understands the semantic relationship.

ContractKen uses DeBERTa (Decoding-enhanced BERT with disentangled attention) for NLI because its disentangled attention mechanism handles long, complex legal sentences more effectively than standard BERT. The models are fine-tuned on legal corpora and evaluated using domain-specific benchmarks.

Example: Indemnification Compliance Check

Premise (Playbook)

"Vendor shall indemnify Client for breach, IP infringement, and willful misconduct, including reasonable attorneys' fees."

Hypothesis (Contract)

"Vendor shall indemnify Client against all losses arising from Vendor's negligence."

CONTRADICTION - Scope limited to negligence only. Missing: breach, IP infringement, willful misconduct, fee recovery.

Example: IP Ownership Check

Premise (Playbook)

"All work product and deliverables shall be owned by Client."

Hypothesis (Contract)

"All intellectual property created in the performance of Services shall be the sole and exclusive property of Client, including all copyrights, patents, and trade secrets therein."

ENTAILMENT - Contract clause meets and exceeds playbook standard. No action needed.

Example: Force Majeure Check

Premise (Playbook)

"Force majeure clause must include pandemic, epidemic, and government-mandated lockdowns as qualifying events."

Hypothesis (Contract)

"Neither party shall be liable for delays caused by acts of God, war, terrorism, or natural disasters."

NEUTRAL - Traditional force majeure language present but does not address pandemic/epidemic events. Recommend expanding.

Why NLI Over Keyword Matching?

Keyword-based contract analysis looks for specific words ("indemnify", "limitation", "terminate"). It breaks when contracts use synonyms, passive constructions, or nested references. NLI understands meaning at the sentence level. It can determine that "the aggregate exposure of the service provider under this instrument shall be constrained to a sum equal to the consideration received" means the same thing as "vendor liability is capped at fees paid" - even though the two sentences share almost no keywords.

Playbook Compliance

Each clause checked against preferred, fallback, and walkaway positions using entailment/contradiction scoring.

Clause Classification

Identifying clause types across 100+ categories, even when the language is non-standard or jurisdiction-specific.

Gap Detection

Determining which required clause types are absent from a contract by checking the full document against the playbook's required provisions.

Grounding AI in Your Standards

RAG Architecture for Legal Documents

Standard Retrieval-Augmented Generation fails for contracts because it ignores document structure, cross-references, and the dependency relationships between clauses. ContractKen's RAG is built for legal documents specifically.

Standard RAG

Where Generic RAG Breaks Down

Fixed-size chunking splits clauses mid-sentence or separates a provision from its carve-outs
No awareness that "as defined in Section 1.3" creates a dependency on another part of the document
Retrieves text by cosine similarity alone, missing structurally related provisions
No distinction between recitals, definitions, operative clauses, and schedules
Embedding models trained on general text miss legal-specific semantic relationships

ContractKen RAG

How ContractKen Handles It

Semantic section-aware chunking that respects clause boundaries and contract hierarchy
Cross-reference resolution: when a clause references "Section 4.2", that section is pulled automatically
Metadata enrichment: each chunk carries its parent section, related definitions, and document position
Multiple retrieval sources activated per task (clause library + playbook + precedents)
Every AI output cites which source document or playbook position it drew from

Knowledge Layers Retrieved Per Review

Clause Library

700+ pre-drafted clauses in 3 negotiation positions. When a deviation is flagged, the system retrieves the appropriate position's language as a suggested replacement.

Playbooks

The organization's defined positions (preferred, fallback, walkaway) for each clause type, along with guidance notes and negotiation reasoning.

Precedents

Prior contracts from the organization's deal history. When drafting, the system retrieves structurally similar precedents to inform clause language and deal terms.

Industry Standards

Benchmark data on market-standard positions by clause type, contract category, and jurisdiction. Used for Comprehensive Review when no playbook is configured.

Example: RAG in Action for an Indemnification Clause

Flagged Clause"Vendor shall indemnify Client against losses arising from negligence." Classified as indemnification, scored as below fallback.

↓

Retrieval: PlaybookFetches playbook rule for indemnification: preferred = breach + IP + misconduct + fees, fallback = breach + misconduct, walkaway = negligence + misconduct

↓

Retrieval: Clause LibraryFetches preferred indemnification clause text (full language with IP carve-out, fee recovery, and survival provision).

↓

Retrieval: Guidance Note"Always push for IP carve-out in software deals. Concede fee recovery before conceding IP. Reference 2023 CloudTech precedent."

↓

LLM Output (Grounded)Generates explanation citing the specific deviation, suggests redline text from clause library preferred position, and includes comment referencing the playbook guidance note.

Privacy by Architecture

The Moderation Layer

Confidential contract text is masked before it reaches any external AI model. This is an architectural control, enforced at the system level.

How It Works (Summary)

The Moderation Layer sits between the contract text and the AI processing layer. It intercepts outbound text, identifies confidential entities using the NER models from the extraction pipeline, applies configurable masking rules, and maintains a mapping table for de-masking the output.

1NER-based entity detection identifies party names, monetary values, dates, proprietary terms, and custom entity types
2Regex pattern matching catches structured data (email addresses, phone numbers, account numbers) that NER may miss
3Custom dictionaries allow organizations to define their own sensitive terms (trade names, project codes, internal product names)
4Mapping table maintains the relationship between masked tokens and original values for de-masking on return
5Configurable per organization - each team controls which entity types are masked and which custom terms are protected

Full technical deep dive on the Moderation Layer →

Example: What the AI Sees

Original Contract Text

"Acme Corporation ("Buyer") shall pay GlobalTech Solutions ("Seller") the sum of $4,750,000 upon completion of the Phase 2 deliverables described in Schedule B of the Master Services Agreement dated January 15, 2026."

After Moderation Layer

"[PARTY_A] ("Buyer") shall pay [PARTY_B] ("Seller") the sum of [AMOUNT_1] upon completion of the [PROJECT_REF] deliverables described in Schedule B of the Master Services Agreement dated [DATE_1]."

AI Analysis (on masked text)

The AI analyzes contract structure, clause compliance, and risk using the masked version. It never sees "Acme Corporation", "$4,750,000", or "Phase 2". When findings are returned, the mapping table restores the original values in the output.

The Right Model for Each Task

Model Routing & Orchestration

Different tasks have different requirements. Clause classification needs precision. Entity extraction needs speed. Risk analysis needs reasoning. ContractKen routes each task to the model best suited for it.

Task	Model Type	Optimized For	Why This Model
Clause Classification	DeBERTa (NLI)	Precision	Clause type identification requires high-precision classification across 100+ categories. DeBERTa's disentangled attention handles long legal sentences where standard BERT struggles.
Entity Extraction	spaCy + Custom NER	Speed + Coverage	Entity extraction runs on every sentence in the document. It needs to be fast and comprehensive. spaCy's pipeline architecture with custom legal entity types provides both.
Playbook Compliance	DeBERTa (NLI) Scoring Rules	Accuracy	NLI determines entailment/contradiction against each playbook position. Scoring rules map NLI output to severity levels (above preferred, below preferred, below fallback, below walkaway).
Risk Analysis & Explanation	LLM (Extended Reasoning)	Reasoning Depth	Explaining why a clause is risky and how to mitigate it requires multi-step reasoning. Extended-reasoning LLMs handle the nuance of "this clause creates risk because of its interaction with Section 4 and the definition of 'Material Adverse Change' in Section 1.2."
Redline Generation	LLM with RAG	Quality + Source Fidelity	Redline text is generated by the LLM but constrained by RAG-retrieved clause library language. The model selects from existing approved language rather than inventing new phrasing.
Formatting & Proofing	Rule-based ML Hybrid	Determinism	Defined term consistency, cross-reference validation, and numbering checks require deterministic correctness. Rule-based checks handle structural validation; ML handles semantic checks (is this term being used consistently across contexts?).
Document Summarization	LLM (Fast)	Speed	Contract summaries need to be generated quickly for preview and triage. Faster LLMs handle summarization while heavier models are reserved for detailed analysis.

Why Multiple Models?

A single LLM cannot be simultaneously optimized for speed (entity extraction on thousands of sentences), precision (clause classification across 100+ types), reasoning depth (multi-clause risk analysis), and determinism (formatting validation). Routing tasks to specialized models means each component operates at peak performance for its specific job.

Model Swappability

The routing architecture is model-agnostic at each decision point. When a new model outperforms the current one on a specific task, it can be swapped in without rebuilding the pipeline. This is how ContractKen has evolved through four generations of AI in three years - the architecture stays stable while individual components improve.

Continuous Improvement

Fine-Tuning & the Human Feedback Loop

ContractKen's models improve over time through domain-specific fine-tuning on legal corpora and a structured feedback loop that incorporates lawyer corrections into future model behavior.

What Fine-Tuning Means in Practice

Pre-trained transformer models (BERT, DeBERTa) have broad language understanding but lack the precision needed for legal clause classification out of the box. Fine-tuning retrains the model's upper layers on labeled legal data - thousands of annotated clause examples across 100+ categories - while preserving the foundational language understanding in the lower layers.

The result: a model that understands general language structure AND recognizes that "the aggregate exposure of the service provider" means "vendor liability cap" in a contract context.

For organization-specific adaptation, fine-tuning can incorporate a firm's own contract corpus. A firm that negotiates IP-heavy technology agreements will have different clause patterns than one focused on commercial real estate leases. The fine-tuned model reflects these domain-specific patterns.

How Lawyer Feedback Improves the System

Every time a lawyer accepts, modifies, or rejects a ContractKen suggestion, that decision becomes a training signal. Accepted suggestions validate the model's judgment. Modified suggestions show where the model was directionally correct but needed refinement. Rejected suggestions identify areas where the model's reasoning diverged from the lawyer's expertise.

These signals are aggregated, reviewed, and periodically incorporated into model updates through supervised fine-tuning. The system does not retrain in real-time on individual interactions - it accumulates feedback and retrains in controlled cycles with human review of the training data.

The Improvement Cycle

1

Model Generates Output

Clause classifications, risk assessments, and redline suggestions produced for a contract review.

2

Lawyer Reviews & Acts

Accepts, modifies, or rejects each suggestion. These actions are logged as feedback signals.

3

Feedback Aggregated

Signals collected across reviews. Patterns identified: which clause types have high accept rates? Where does the model consistently need correction?

4

Supervised Retraining

Curated training data (with human review) used to fine-tune models in controlled cycles. No automatic retraining on raw user data.

5

Evaluation & Deployment

Updated models evaluated against held-out test sets (F1, precision, recall) before deployment. Performance must improve or remain stable on all benchmarks.

How We Measure Model Performance

F1 Score

Harmonic mean of precision and recall. The primary metric for clause classification accuracy across all 100+ types.

Precision

Of all clauses the model flagged as a specific type, what percentage were correct? High precision reduces false positives.

Recall

Of all actual instances of a clause type, what percentage did the model identify? High recall ensures nothing is missed.

Accept Rate

Percentage of AI suggestions accepted by lawyers without modification. A practical, production-level quality indicator.

Security & Trust

Built for Enterprise Legal Teams

Legal AI handles some of the most sensitive documents in an organization. The architecture reflects that responsibility at every level.

Data Privacy

Confidential information is protected by architectural controls at the system level.

Moderation Layer masks entities before AI processing
No customer data used for model training
Data isolation per organization
Customer-configurable anonymization rules
Mapping tables discarded after processing

Infrastructure Security

Enterprise-grade security infrastructure with certification and compliance.

SOC 2 Type II certification in progress
AES-256 encryption at rest
TLS 1.2+ encryption in transit
AWS hosting with regional data residency options
99.5% uptime SLA

Compliance & Governance

Designed for regulated industries and attorney-client privilege requirements.

GDPR and CCPA compliant
ISO 27001/27701 aligned
Customers own all AI outputs
Audit logging for all AI interactions
Role-based access controls

The Lawyer Remains the Final Decision Maker

ContractKen generates suggestions. The lawyer accepts, modifies, or rejects each one. Nothing is applied to the document without explicit human approval. Every tracked change and comment can be reviewed before the contract goes to the counterparty. AI assists the judgment - it does not replace it.

Technical FAQ

Claude / ChatGPT is a single general-purpose LLM. ContractKen is a compound AI system with multiple specialized models working in a coordinated pipeline. Clause classification uses fine-tuned DeBERTa (NLI). Entity extraction uses spaCy NER. Risk scoring uses playbook-specific classifiers. The LLM is one component in an 8-step pipeline, constrained by RAG context and the Moderation Layer. A single LLM call cannot reliably parse document structure, classify 100+ clause types, check playbook compliance, AND generate accurate redlines. Each task requires a model optimized for that specific job.

ContractKen uses multiple models, each selected for a specific task. DeBERTa (fine-tuned on legal corpora) handles clause classification and playbook compliance via Natural Language Inference. spaCy with custom entity types handles Named Entity Recognition. Large language models handle analysis, explanation, and redline generation, grounded by RAG context. Rule-based systems handle formatting validation and cross-reference checking. The routing layer selects the optimal model for each task in the pipeline.

Three controls. First, the LLM is constrained by RAG - it retrieves specific clause library language, playbook positions, and precedent text before generating any output, and is required to cite its sources. Second, upstream NLI classification provides an independent check on clause identification - the LLM does not decide what type of clause it's looking at; DeBERTa already classified it. Third, the human-in-the-loop design means every suggestion is reviewed by the lawyer before it is applied to the document. The LLM generates proposals, not final outputs.

Yes, at multiple levels. Playbooks codify your organization's specific negotiation positions and risk thresholds. The clause library can include your proprietary clause language alongside the 700+ pre-built clauses. The RAG knowledge layer retrieves from your precedents and standards. The Moderation Layer's anonymization rules are configurable per organization. For enterprise deployments, fine-tuning on your contract corpus is available to improve classification accuracy for your specific clause patterns and drafting conventions.

Contract text passes through the Moderation Layer before reaching any external AI model. The Moderation Layer masks confidential entities (party names, amounts, proprietary terms) using NER and configurable rules. The masked text is processed by the AI pipeline. The mapping table restores original values in the output. No customer data is used for model training. Data is encrypted at rest (AES-256) and in transit (TLS 1.2+).

ContractKen evaluates clause classification using F1 score, precision, and recall on held-out legal test sets. The fine-tuned DeBERTa models are trained on thousands of annotated clause examples across 100+ categories. Published research on similar architectures (Legal-BERT on construction contracts) has achieved F1 scores above 0.93. ContractKen's production models are evaluated against domain-specific benchmarks before every deployment, and performance must remain stable or improve to proceed.

Natural Language Inference (NLI) determines the logical relationship between two text segments: entailment (one supports the other), contradiction (they conflict), or neutral (no clear relationship). In contract review, NLI compares each clause against the playbook standard to determine compliance. This is more reliable than keyword matching because contracts express the same concepts in vastly different language across jurisdictions, drafting styles, and industries. DeBERTa's disentangled attention mechanism handles the long, complex sentence structures common in legal drafting.

ContractKen has been building contract AI since 2022, through four phases. Phase 1 (2022-2023) built the NLP foundation: fine-tuned BERT for clause detection, KNN clustering for standard clause recognition, and NER for entity extraction. Phase 2 (2023-2024) added intelligence: DeBERTa for NLI-based compliance checking, multi-model routing, and domain-specific fine-tuning. Phase 3 (2024-2025) integrated LLMs with RAG grounding and the Moderation Layer for privacy. Phase 4 (2025-present) orchestrates the full compound AI system with playbook enforcement, precedent-based drafting, and analytics. The architecture has remained stable while individual components have been upgraded through four generations.

AI for Contract Drafting and Review

How ContractKen's AI Has Evolved

Foundation - Pattern Recognition & Classification

Intelligence - NLI, Fine-Tuning & Multi-Model Architecture

Generation - LLM Integration with RAG & the Moderation Layer

Orchestration - The Compound AI System

The Contract Review Pipeline

Document Parsing & Structure Extraction Rule-based + ML

Clause Segmentation & Classification DeBERTa NLI

Entity Extraction (NER) spaCy + Custom Models

Risk Assessment & Playbook Scoring NLI + Scoring Models

Moderation Layer (Privacy Gate) NER + Regex + Custom Rules

Retrieval-Augmented Generation (RAG) Embeddings + Vector Search

Analysis & Redline Generation (LLM) LLM with RAG Context

Post-Processing & Word Integration Office.js + Formatting

Natural Language Inference for Clause Compliance

How NLI Works in Practice

Why NLI Over Keyword Matching?

Playbook Compliance

Clause Classification

Gap Detection

RAG Architecture for Legal Documents

Where Generic RAG Breaks Down

How ContractKen Handles It

Knowledge Layers Retrieved Per Review

Example: RAG in Action for an Indemnification Clause

The Moderation Layer

How It Works (Summary)

Example: What the AI Sees

Model Routing & Orchestration

Why Multiple Models?

Model Swappability

Fine-Tuning & the Human Feedback Loop

What Fine-Tuning Means in Practice

How Lawyer Feedback Improves the System

Model Generates Output

Lawyer Reviews & Acts

Feedback Aggregated

Supervised Retraining

Evaluation & Deployment

How We Measure Model Performance

Built for Enterprise Legal Teams

Data Privacy

Infrastructure Security

Compliance & Governance

The Lawyer Remains the Final Decision Maker

Technical FAQ