Modern Python library for LLM-powered contract intelligence and legal document analysis
Project description
ContractEx: Modern Contract Intelligence for Python
🔥 LLM-powered contract analysis | 📋 CUAD taxonomy | 🛡️ Risk detection | 🔒 Privacy-first
ContractEx is a production-ready Python library for intelligent contract analysis using large language models. Extract clauses, identify parties, analyze risks, and extract financial terms from legal documents with a clean, intuitive API.
✨ Features
- 🚀 Simple API: Extract contracts with a single line of code
- 🧠 Multi-LLM Support: OpenAI (GPT-4o), Anthropic (Claude), Google (Gemini), local models (Llama via Ollama)
- 📋 CUAD Taxonomy: 41 standard clause types from the Contract Understanding Atticus Dataset
- 🛡️ Risk Analysis: Automatic detection of unfavorable terms and potential risks
- 💰 Financial Extraction: Extract payment terms, amounts, and conditions
- 🔒 Privacy-First: Local LLM support for sensitive documents
- 🧑⚖️ Named Entity Recognition: Extract parties, dates, and legal entities using spaCy/Blackstone
- � Dataset Loaders: Built-in access to ACORD, CUAD, and LePaRD benchmarks
- �🔗 Extensible: LangChain and spaCy compatibility
- 📊 Export: JSON, Excel, CSV output formats
- ⚡ Fast: Batch processing with parallel execution
- ✅ Type-Safe: Full type hints and Pydantic models
📦 Installation
Quick Install
# Clone repository
git clone https://github.com/aahepburn/Contract-Clause-Extractor.git
cd Contract-Clause-Extractor
# Install all dependencies (single requirements file)
pip install -r requirements.txt
# Or install as editable package
pip install -e .
Using pyproject.toml (Optional Feature Groups)
# Install specific feature groups
pip install -e ".[ocr]" # OCR support for scanned PDFs
pip install -e ".[spacy]" # Named Entity Recognition
pip install -e ".[langchain]" # LangChain integration
pip install -e ".[local]" # Local LLM support (Ollama)
pip install -e ".[storage]" # PostgreSQL storage
pip install -e ".[datasets]" # Dataset loaders (ACORD, CUAD, LePaRD)
pip install -e ".[all]" # All features
Configuration
# Create .env file with your API keys
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=your-google-api-key
🚀 Quick Start
Basic Usage (< 10 lines)
from contractex import extract_contract
# Extract contract with one line
contract = extract_contract("contract.pdf")
# Access results
print(f"Parties: {', '.join([p.name for p in contract.parties])}")
print(f"Clauses: {len(contract.clauses)}")
print(f"Risks: {len(contract.risks)} ({len(contract.critical_risks)} critical)")
# Export
contract.to_json("output.json")
contract.to_excel("output.xlsx")
Advanced Usage
from contractex import ContractExtractor
from contractex.llm import OpenAIProvider
from contractex.loaders import PDFLoader
from contractex.chunking import ClauseAwareChunker
# Configure custom components
llm = OpenAIProvider(model="gpt-4o", temperature=0.0)
loader = PDFLoader(ocr_enabled=True, preserve_layout=True)
chunker = ClauseAwareChunker(max_chunk_size=4000, overlap=200)
# Create extractor
extractor = ContractExtractor(
llm_provider=llm,
document_loader=loader,
chunking_strategy=chunker,
confidence_threshold=0.8
)
# Extract with options
contract = extractor.extract(
"complex_contract.pdf",
analyze_risks=True,
extract_financial=True
)
📊 Dataset Loading
Load popular legal contract datasets for training and evaluation:
from contractex.data import load_cuad, load_acord, load_lepard
# Load CUAD (Contract Understanding Atticus Dataset)
cuad_df = load_cuad(split='train')
print(f"Loaded {len(cuad_df)} contracts with 41 clause types")
# Load ACORD (clause retrieval benchmark)
acord_df = load_acord(split='train')
# Load LePaRD (legal passage retrieval)
lepard_df = load_lepard()
See contractex/data/README.md for full documentation.
🎯 Use Cases
Legal Teams
- Contract Review & Due Diligence
- Risk Assessment & Compliance
- M&A Document Analysis
Procurement Teams
- Vendor Agreement Review
- Payment Terms Verification
- SLA Analysis
Sales & Business Development
- Deal Analysis & Redlining Support
- Contract Comparison
- Archive Search
🧠 LLM Providers
- OpenAI (GPT-4o): Best accuracy (~$0.025/contract)
- Anthropic (Claude): Large documents (~$0.030/contract)
- Google (Gemini): Fast and cost-effective (~$0.002/contract)
- Local (Llama): Privacy-first, zero cost
📚 Documentation & Examples
- CHANGELOG.md - Version history and release notes
- Examples Directory - Ready-to-run examples:
basic_extraction.py- Simple usageadvanced_extraction.py- Custom configurationbatch_processing.py- Multiple contractslangchain_integration.py- LangChain usagelocal_llm_example.py- Privacy-first localfastapi_service.py- REST APIdataset_loading.py- Working with legal datasetsner_example.py- Named entity recognitionstorage_example.py- PostgreSQL persistence
Run examples: python examples/basic_extraction.py
🧪 Testing & Development
# Run all tests
pytest
# With coverage
pytest --cov=contractex --cov-report=html
# Code quality
black contractex/ # Format code
ruff check contractex/ --fix # Lint
mypy contractex/ # Type check
🤝 Contributing
Contributions welcome! See CONTRIBUTING.md for guidelines.
📄 License
Apache 2.0 License - see LICENSE for details.
Built with ❤️ for the legal tech community
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contractex-0.1.0.tar.gz.
File metadata
- Download URL: contractex-0.1.0.tar.gz
- Upload date:
- Size: 140.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b0013ea478deeb6e3eeaf2ab37657b2f0cda6d837329403ffaf1f5fa6394cae
|
|
| MD5 |
bc662c0d2872974cc0d255de34e60151
|
|
| BLAKE2b-256 |
cdcc56f0bd1c8686fe8a8a845c754bcf00b148ad4cda856adea7adbab72f70c5
|
File details
Details for the file contractex-0.1.0-py3-none-any.whl.
File metadata
- Download URL: contractex-0.1.0-py3-none-any.whl
- Upload date:
- Size: 117.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ee70ea35b49587d1a932c969b9368bf556e4baf9d3007c9f5fd97b4a85312c6
|
|
| MD5 |
f0b425a415cda4a550d3c4db533a6668
|
|
| BLAKE2b-256 |
6554ad3a087ef01b586a31a5d5707cbcd11a70ad6217fa168af57ce192588a01
|