Skip to main content

Modern Python library for LLM-powered contract intelligence and legal document analysis

Project description

ContractEx: Modern Contract Intelligence for Python

LLM-powered contract analysis | CUAD taxonomy | Risk detection | Privacy-first

ContractEx is a production-ready Python library for intelligent contract analysis using large language models. Extract clauses, identify parties, analyze risks, and extract financial terms from legal documents with a clean, intuitive API.

PyPI version Python 3.9+ License


Features

  • ** Simple API**: Extract contracts with a single line of code
  • ** Multi-LLM Support**: OpenAI (GPT-4o), Anthropic (Claude), Google (Gemini), local models (Llama via Ollama)
  • ** CUAD Taxonomy**: 41 standard clause types from the Contract Understanding Atticus Dataset
  • ** Risk Analysis**: Automatic detection of unfavorable terms and potential risks
  • ** Financial Extraction**: Extract payment terms, amounts, and conditions
  • ** Privacy-First**: Local LLM support for sensitive documents
  • ** Named Entity Recognition**: Extract parties, dates, and legal entities using spaCy/Blackstone
  • ** Dataset Loaders**: Built-in access to ACORD, CUAD, and LePaRD benchmarks
  • ** Extensible**: LangChain and spaCy compatibility
  • ** Export**: JSON, Excel, CSV output formats
  • ** Fast**: Batch processing with parallel execution
  • ** Type-Safe**: Full type hints and Pydantic models

Installation

Quick Install

# Clone repository
git clone https://github.com/aahepburn/Contract-Clause-Extractor.git
cd Contract-Clause-Extractor

# Install all dependencies (single requirements file)
pip install -r requirements.txt

# Or install as editable package
pip install -e .

Using pyproject.toml (Optional Feature Groups)

# Install specific feature groups
pip install -e ".[ocr]"        # OCR support for scanned PDFs
pip install -e ".[spacy]"      # Named Entity Recognition
pip install -e ".[langchain]"  # LangChain integration
pip install -e ".[local]"      # Local LLM support (Ollama)
pip install -e ".[storage]"    # PostgreSQL storage
pip install -e ".[datasets]"   # Dataset loaders (ACORD, CUAD, LePaRD)
pip install -e ".[all]"        # All features

Configuration

# Create .env file with your API keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=your-google-api-key

Quick Start

Basic Usage (< 10 lines)

from contractex import extract_contract

# Extract contract with one line
contract = extract_contract("contract.pdf")

# Access results
print(f"Parties: {', '.join([p.name for p in contract.parties])}")
print(f"Clauses: {len(contract.clauses)}")
print(f"Risks: {len(contract.risks)} ({len(contract.critical_risks)} critical)")

# Export
contract.to_json("output.json")
contract.to_excel("output.xlsx")

Advanced Usage

from contractex import ContractExtractor
from contractex.llm import OpenAIProvider
from contractex.loaders import PDFLoader
from contractex.chunking import ClauseAwareChunker

# Configure custom components
llm = OpenAIProvider(model="gpt-4o", temperature=0.0)
loader = PDFLoader(ocr_enabled=True, preserve_layout=True)
chunker = ClauseAwareChunker(max_chunk_size=4000, overlap=200)

# Create extractor
extractor = ContractExtractor(
    llm_provider=llm,
    document_loader=loader,
    chunking_strategy=chunker,
    confidence_threshold=0.8
)

# Extract with options
contract = extractor.extract(
    "complex_contract.pdf",
    analyze_risks=True,
    extract_financial=True
)

Dataset Loading

Load popular legal contract datasets for training and evaluation:

from contractex.data import load_cuad, load_acord, load_lepard

# Load CUAD (Contract Understanding Atticus Dataset)
cuad_df = load_cuad(split='train')
print(f"Loaded {len(cuad_df)} contracts with 41 clause types")

# Load ACORD (clause retrieval benchmark)
acord_df = load_acord(split='train')

# Load LePaRD (legal passage retrieval)
lepard_df = load_lepard()

See contractex/data/README.md for full documentation.


Use Cases

Legal Teams

  • Contract Review & Due Diligence
  • Risk Assessment & Compliance
  • M&A Document Analysis

Procurement Teams

  • Vendor Agreement Review
  • Payment Terms Verification
  • SLA Analysis

Sales & Business Development

  • Deal Analysis & Redlining Support
  • Contract Comparison
  • Archive Search

LLM Providers

  • OpenAI (GPT-4o): Best accuracy (~$0.025/contract)
  • Anthropic (Claude): Large documents (~$0.030/contract)
  • Google (Gemini): Fast and cost-effective (~$0.002/contract)
  • Local (Llama): Privacy-first, zero cost

Documentation & Examples

  • CHANGELOG.md - Version history and release notes
  • Examples Directory - Ready-to-run examples:
    • basic_extraction.py - Simple usage
    • advanced_extraction.py - Custom configuration
    • batch_processing.py - Multiple contracts
    • langchain_integration.py - LangChain usage
    • local_llm_example.py - Privacy-first local
    • fastapi_service.py - REST API
    • dataset_loading.py - Working with legal datasets
    • ner_example.py - Named entity recognition
    • storage_example.py - PostgreSQL persistence

Run examples: python examples/basic_extraction.py


Testing & Development

# Run all tests
pytest

# With coverage
pytest --cov=contractex --cov-report=html

# Code quality
black contractex/           # Format code
ruff check contractex/ --fix  # Lint
mypy contractex/             # Type check

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.


License

Apache 2.0 License - see LICENSE for details.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contractex-0.1.1.tar.gz (140.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contractex-0.1.1-py3-none-any.whl (117.4 kB view details)

Uploaded Python 3

File details

Details for the file contractex-0.1.1.tar.gz.

File metadata

  • Download URL: contractex-0.1.1.tar.gz
  • Upload date:
  • Size: 140.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for contractex-0.1.1.tar.gz
Algorithm Hash digest
SHA256 ba827b232e8b5033169342f4fd93da43df3fdd984c962d3c46a90e24851cd36e
MD5 4ee5b2fde5a77c37a6d6200c91c2f201
BLAKE2b-256 3e7136d586503bfcfd83407ae73fe5c0f1572c3ad4aba1701965e2d8cb4d0918

See more details on using hashes here.

File details

Details for the file contractex-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: contractex-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 117.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for contractex-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 22ec132077230600517786360b30e8de8365ffda87047711df14bf62dcccc3af
MD5 7fc2d88be7a2fa58e352107dd117ca50
BLAKE2b-256 dce0b8c06d9a3b1beac7c8754beb3c3b4ac4dd8fb262a3ae12a53b4e61854666

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page