Skip to main content

Modern Python library for LLM-powered contract intelligence and legal document analysis

Project description

ContractEx: Modern Contract Intelligence for Python

🔥 LLM-powered contract analysis | 📋 CUAD taxonomy | 🛡️ Risk detection | 🔒 Privacy-first

ContractEx is a production-ready Python library for intelligent contract analysis using large language models. Extract clauses, identify parties, analyze risks, and extract financial terms from legal documents with a clean, intuitive API.

PyPI version Python 3.9+ License


✨ Features

  • 🚀 Simple API: Extract contracts with a single line of code
  • 🧠 Multi-LLM Support: OpenAI (GPT-4o), Anthropic (Claude), Google (Gemini), local models (Llama via Ollama)
  • 📋 CUAD Taxonomy: 41 standard clause types from the Contract Understanding Atticus Dataset
  • 🛡️ Risk Analysis: Automatic detection of unfavorable terms and potential risks
  • 💰 Financial Extraction: Extract payment terms, amounts, and conditions
  • 🔒 Privacy-First: Local LLM support for sensitive documents
  • 🧑‍⚖️ Named Entity Recognition: Extract parties, dates, and legal entities using spaCy/Blackstone
  • � Dataset Loaders: Built-in access to ACORD, CUAD, and LePaRD benchmarks
  • �🔗 Extensible: LangChain and spaCy compatibility
  • 📊 Export: JSON, Excel, CSV output formats
  • ⚡ Fast: Batch processing with parallel execution
  • ✅ Type-Safe: Full type hints and Pydantic models

📦 Installation

Quick Install

# Clone repository
git clone https://github.com/aahepburn/Contract-Clause-Extractor.git
cd Contract-Clause-Extractor

# Install all dependencies (single requirements file)
pip install -r requirements.txt

# Or install as editable package
pip install -e .

Using pyproject.toml (Optional Feature Groups)

# Install specific feature groups
pip install -e ".[ocr]"        # OCR support for scanned PDFs
pip install -e ".[spacy]"      # Named Entity Recognition
pip install -e ".[langchain]"  # LangChain integration
pip install -e ".[local]"      # Local LLM support (Ollama)
pip install -e ".[storage]"    # PostgreSQL storage
pip install -e ".[datasets]"   # Dataset loaders (ACORD, CUAD, LePaRD)
pip install -e ".[all]"        # All features

Configuration

# Create .env file with your API keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=your-google-api-key

🚀 Quick Start

Basic Usage (< 10 lines)

from contractex import extract_contract

# Extract contract with one line
contract = extract_contract("contract.pdf")

# Access results
print(f"Parties: {', '.join([p.name for p in contract.parties])}")
print(f"Clauses: {len(contract.clauses)}")
print(f"Risks: {len(contract.risks)} ({len(contract.critical_risks)} critical)")

# Export
contract.to_json("output.json")
contract.to_excel("output.xlsx")

Advanced Usage

from contractex import ContractExtractor
from contractex.llm import OpenAIProvider
from contractex.loaders import PDFLoader
from contractex.chunking import ClauseAwareChunker

# Configure custom components
llm = OpenAIProvider(model="gpt-4o", temperature=0.0)
loader = PDFLoader(ocr_enabled=True, preserve_layout=True)
chunker = ClauseAwareChunker(max_chunk_size=4000, overlap=200)

# Create extractor
extractor = ContractExtractor(
    llm_provider=llm,
    document_loader=loader,
    chunking_strategy=chunker,
    confidence_threshold=0.8
)

# Extract with options
contract = extractor.extract(
    "complex_contract.pdf",
    analyze_risks=True,
    extract_financial=True
)

📊 Dataset Loading

Load popular legal contract datasets for training and evaluation:

from contractex.data import load_cuad, load_acord, load_lepard

# Load CUAD (Contract Understanding Atticus Dataset)
cuad_df = load_cuad(split='train')
print(f"Loaded {len(cuad_df)} contracts with 41 clause types")

# Load ACORD (clause retrieval benchmark)
acord_df = load_acord(split='train')

# Load LePaRD (legal passage retrieval)
lepard_df = load_lepard()

See contractex/data/README.md for full documentation.


🎯 Use Cases

Legal Teams

  • Contract Review & Due Diligence
  • Risk Assessment & Compliance
  • M&A Document Analysis

Procurement Teams

  • Vendor Agreement Review
  • Payment Terms Verification
  • SLA Analysis

Sales & Business Development

  • Deal Analysis & Redlining Support
  • Contract Comparison
  • Archive Search

🧠 LLM Providers

  • OpenAI (GPT-4o): Best accuracy (~$0.025/contract)
  • Anthropic (Claude): Large documents (~$0.030/contract)
  • Google (Gemini): Fast and cost-effective (~$0.002/contract)
  • Local (Llama): Privacy-first, zero cost

📚 Documentation & Examples

  • CHANGELOG.md - Version history and release notes
  • Examples Directory - Ready-to-run examples:
    • basic_extraction.py - Simple usage
    • advanced_extraction.py - Custom configuration
    • batch_processing.py - Multiple contracts
    • langchain_integration.py - LangChain usage
    • local_llm_example.py - Privacy-first local
    • fastapi_service.py - REST API
    • dataset_loading.py - Working with legal datasets
    • ner_example.py - Named entity recognition
    • storage_example.py - PostgreSQL persistence

Run examples: python examples/basic_extraction.py


🧪 Testing & Development

# Run all tests
pytest

# With coverage
pytest --cov=contractex --cov-report=html

# Code quality
black contractex/           # Format code
ruff check contractex/ --fix  # Lint
mypy contractex/             # Type check

🤝 Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.


📄 License

Apache 2.0 License - see LICENSE for details.


Built with ❤️ for the legal tech community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contractex-0.1.0.tar.gz (140.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contractex-0.1.0-py3-none-any.whl (117.6 kB view details)

Uploaded Python 3

File details

Details for the file contractex-0.1.0.tar.gz.

File metadata

  • Download URL: contractex-0.1.0.tar.gz
  • Upload date:
  • Size: 140.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for contractex-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4b0013ea478deeb6e3eeaf2ab37657b2f0cda6d837329403ffaf1f5fa6394cae
MD5 bc662c0d2872974cc0d255de34e60151
BLAKE2b-256 cdcc56f0bd1c8686fe8a8a845c754bcf00b148ad4cda856adea7adbab72f70c5

See more details on using hashes here.

File details

Details for the file contractex-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: contractex-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 117.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for contractex-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8ee70ea35b49587d1a932c969b9368bf556e4baf9d3007c9f5fd97b4a85312c6
MD5 f0b425a415cda4a550d3c4db533a6668
BLAKE2b-256 6554ad3a087ef01b586a31a5d5707cbcd11a70ad6217fa168af57ce192588a01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page