Python-native clinical NLP framework mirroring Apache cTAKES functionality
Project description
PyCTAKES ๐ฅ
Open Source Python-native Clinical NLP Framework
๐ A modern, open source clinical NLP framework that mirrors and extends Apache cTAKES functionality in pure Python. Drop-in replacement with superior usability, extensibility, and performance.
PyCTAKES transforms clinical text processing by providing a 100% open source, Python-native alternative to Apache cTAKES. Built by the community, for the community - no vendor lock-in, no licensing fees, just powerful clinical NLP tools that anyone can use, modify, and contribute to.
๐ Why Choose PyCTAKES?
๐ Fully Open Source
|
โก Modern & Fast
|
๐ฅ Clinical-First Design
|
๐ง Developer Friendly
|
๐ Quick Start
Installation
pip install pytakes
30-Second Demo
import pytakes
# Create pipeline
pipeline = pytakes.create_default_pipeline()
# Process clinical text
clinical_note = """
Patient is a 65-year-old male with diabetes and hypertension.
He denies chest pain but reports shortness of breath.
Current medications: metformin 500mg BID, lisinopril 10mg daily.
"""
result = pipeline.process_text(clinical_note)
# Explore results
print(f"Found {len(result.entities)} clinical entities:")
for entity in result.entities[:3]:
assertion = entity.assertion
print(f" โข {entity.text} ({entity.label})")
print(f" โ {assertion.polarity}, {assertion.uncertainty}")
Output:
Found 8 clinical entities:
โข diabetes (CONDITION)
โ POSITIVE, CERTAIN
โข hypertension (CONDITION)
โ POSITIVE, CERTAIN
โข chest pain (SYMPTOM)
โ NEGATIVE, CERTAIN
๐ Performance & Features
โก Blazing Fast Performance
- Basic Pipeline: 39 annotations in 0.010s
- Fast Pipeline: 36 annotations in 0.001s
- Full Clinical Note: 81 annotations in 0.504s
๐ฏ Comprehensive Clinical NLP
| Feature | Description | Status |
|---|---|---|
| Sentence Segmentation | Clinical-aware sentence boundary detection | โ |
| Tokenization | Advanced tokenization with POS tagging | โ |
| Section Detection | Chief Complaint, History, Medications, Assessment, etc. | โ |
| Named Entity Recognition | Medications, conditions, procedures, anatomy | โ |
| Assertion Detection | Negation, uncertainty, temporal, experiencer | โ |
| UMLS Concept Mapping | CUI normalization and semantic types | โ |
| Relation Extraction | Temporal and dosage relationships | ๐ v1.1 |
| REST API Service | FastAPI deployment wrapper | ๐ v1.1 |
๐ง Three Pipeline Types
# Full-featured (highest accuracy)
pipeline = pytakes.create_default_pipeline()
# Speed-optimized (fastest processing)
pipeline = pytakes.create_fast_pipeline()
# Minimal (basic entity extraction)
pipeline = pytakes.create_basic_pipeline()
๐ป Command Line Interface
# Process single file
pytakes process note.txt --output results.json
# Batch processing
pytakes process notes/*.txt --output-dir results/
# Different pipelines and formats
pytakes process note.txt --pipeline fast --format xml
pytakes process note.txt --config custom_config.json
๐ค Open Source Community
๐ฅ Lead Contributors
- Sonish Sivarajkumar - Lead Maintainer & Creator
- Clinical NLP researcher and software engineer
- Apache cTAKES community member
- Python & healthcare technology enthusiast
๐ Join Our Community
We're building the future of clinical NLP together! Whether you're a:
- ๐ฉโโ๏ธ Clinician - Help us understand real-world clinical text challenges
- ๐จโ๐ป Developer - Contribute code, fix bugs, or add new features
- ๐ฌ Researcher - Share use cases, benchmarks, and domain expertise
- ๐ Technical Writer - Improve documentation and tutorials
- ๐จ Designer - Enhance user experience and visualization
Everyone is welcome! Check out our Contributing Guide to get started.
๐ Community Stats
- Contributors: Growing community of clinical NLP enthusiasts
- Issues: Active issue tracking and feature requests
- Discussions: Technical discussions and use case sharing
- Releases: Regular updates with new features and improvements
๐ฏ Ways to Contribute
|
๐ Report Issues
|
๐ก Share Ideas
|
๐ง Code Contributions
|
๐ Documentation
|
๐ Documentation & Resources
- ๐ Full Documentation - Complete guides and API reference
- ๐ Quick Start Guide - Get up and running in minutes
- ๐ก Examples - Real-world usage examples and configurations
- ๐ง API Reference - Detailed API documentation
- โก Performance Guide - Optimization tips and benchmarks
- ๐ค Contributing - How to contribute to the project
๐บ๏ธ Roadmap
๐ฏ v1.0 (Current) - Foundation
- โ Core pipeline architecture
- โ Clinical text processing (tokenization, NER, assertion)
- โ UMLS concept mapping framework
- โ CLI and Python APIs
- โ Comprehensive documentation
โก v1.1 (Next) - Enhancement
- ๐ Enhanced UMLS integration (QuickUMLS)
- ๐ Relation extraction (temporal, dosage)
- ๐ REST API service wrapper
- ๐ Docker containers & deployment guides
- ๐ Performance optimizations
๐ v2.0 (Future) - Intelligence
- ๐ฎ LLM integration for disambiguation
- ๐ฎ Active learning capabilities
- ๐ฎ Advanced relation extraction
- ๐ฎ Real-time processing pipelines
- ๐ฎ Federated learning support
๐ Why Open Source Matters in Healthcare
Healthcare technology should be:
- ๐ Transparent - Auditable algorithms for patient safety
- ๐ค Collaborative - Shared knowledge accelerates progress
- โฟ Accessible - No barriers to life-saving technology
- ๐ง Customizable - Adaptable to diverse clinical environments
- ๐ Sustainable - Community-driven long-term maintenance
PyTAKES embodies these principles by providing enterprise-grade clinical NLP capabilities as a truly open source project. No hidden costs, no vendor dependencies, just powerful tools for advancing healthcare through technology.
๐ License & Citation
๐ License
PyTAKES is released under the MIT License - see LICENSE for details.
Copyright (c) 2025 Sonish Sivarajkumar and Contributors
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
[Full license text in LICENSE file]
๐ Citation
If you use PyTAKES in your research, please cite:
@software{pytakes2025,
title={PyTAKES: Open Source Python-native Clinical NLP Framework},
author={Sivarajkumar, Sonish and Contributors},
year={2025},
url={https://github.com/sonishsivarajkumar/PyTAKES},
version={1.0.0}
}
๐ Acknowledgments
PyTAKES builds upon the excellent work of:
- Apache cTAKES - Pioneering clinical NLP framework
- spaCy & Stanza - Modern NLP processing libraries
- Clinical NLP Community - Researchers and practitioners advancing the field
- Open Source Contributors - Everyone who helps make this project better
๐ Get Started Today!
# Install PyTAKES
pip install pytakes
# Clone the repository
git clone https://github.com/sonishsivarajkumar/PyTAKES.git
cd PyTAKES
# Try the examples
python examples/comprehensive_demo.py
Join us in revolutionizing clinical NLP! ๐
โญ Star this repo | ๐ Read the docs | ๐ค Contribute | ๐ฌ Discuss
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyctakes-1.0.0.tar.gz.
File metadata
- Download URL: pyctakes-1.0.0.tar.gz
- Upload date:
- Size: 37.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
360f2faa2fa1d9404dbfd697d73b84699872e05e48ba27e9ca5b31029b37c9cd
|
|
| MD5 |
76257de4e6fe409e2fc87c75b11899ef
|
|
| BLAKE2b-256 |
9b91053264c4a8d50321c395672f25aa1c26373e5a2aa8f07a00b7d26fe2539f
|
File details
Details for the file pyctakes-1.0.0-py3-none-any.whl.
File metadata
- Download URL: pyctakes-1.0.0-py3-none-any.whl
- Upload date:
- Size: 32.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b22bc146815495c468a5e6e2c608f29acfdfacdcbefedb284ebc586473196d5
|
|
| MD5 |
0dfea03466cf608ae18230ccca7b28e6
|
|
| BLAKE2b-256 |
c63e3586e01faf3b3aa88439dd55b1abd03da78d54a86f8611c49b37e71fcb9c
|