Skip to main content

Python-native clinical NLP framework mirroring Apache cTAKES functionality

Project description

PyCTAKES ๐Ÿฅ

Open Source Python-native Clinical NLP Framework

License: MIT Python 3.8+ GitHub issues GitHub stars GitHub forks Contributions welcome

๐Ÿš€ A modern, open source clinical NLP framework that mirrors and extends Apache cTAKES functionality in pure Python. Drop-in replacement with superior usability, extensibility, and performance.

PyCTAKES transforms clinical text processing by providing a 100% open source, Python-native alternative to Apache cTAKES. Built by the community, for the community - no vendor lock-in, no licensing fees, just powerful clinical NLP tools that anyone can use, modify, and contribute to.


๐ŸŒŸ Why Choose PyCTAKES?

๐Ÿ”“ Fully Open Source

  • MIT License - Free for commercial & research use
  • Transparent development - All code, issues, and discussions public
  • Community-driven - Shaped by real user needs
  • No vendor lock-in - Own your clinical NLP pipeline

โšก Modern & Fast

  • Pure Python - No Java dependencies
  • pip installable - Get started in seconds
  • Multiple backends - spaCy, Stanza, rule-based
  • Production ready - Optimized for real-world use

๐Ÿฅ Clinical-First Design

  • Medical expertise built-in - Clinical abbreviations, sections, terminology
  • cTAKES compatibility - Drop-in replacement for existing workflows
  • Comprehensive NLP - Tokenization โ†’ UMLS mapping
  • Assertion detection - Negation, uncertainty, temporal context

๐Ÿ”ง Developer Friendly

  • Clean Python APIs - Intuitive and well-documented
  • Modular architecture - Use only what you need
  • Extensible framework - Easy to add custom annotators
  • Rich ecosystem - Integrates with pandas, spaCy, transformers

๐Ÿš€ Quick Start

Installation

pip install pytakes

30-Second Demo

import pytakes

# Create pipeline
pipeline = pytakes.create_default_pipeline()

# Process clinical text
clinical_note = """
Patient is a 65-year-old male with diabetes and hypertension.
He denies chest pain but reports shortness of breath.
Current medications: metformin 500mg BID, lisinopril 10mg daily.
"""

result = pipeline.process_text(clinical_note)

# Explore results
print(f"Found {len(result.entities)} clinical entities:")
for entity in result.entities[:3]:
    assertion = entity.assertion
    print(f"  โ€ข {entity.text} ({entity.label})")
    print(f"    โ†’ {assertion.polarity}, {assertion.uncertainty}")

Output:

Found 8 clinical entities:
  โ€ข diabetes (CONDITION)
    โ†’ POSITIVE, CERTAIN
  โ€ข hypertension (CONDITION)  
    โ†’ POSITIVE, CERTAIN
  โ€ข chest pain (SYMPTOM)
    โ†’ NEGATIVE, CERTAIN

๐Ÿ“Š Performance & Features

โšก Blazing Fast Performance

  • Basic Pipeline: 39 annotations in 0.010s
  • Fast Pipeline: 36 annotations in 0.001s
  • Full Clinical Note: 81 annotations in 0.504s

๐ŸŽฏ Comprehensive Clinical NLP

Feature Description Status
Sentence Segmentation Clinical-aware sentence boundary detection โœ…
Tokenization Advanced tokenization with POS tagging โœ…
Section Detection Chief Complaint, History, Medications, Assessment, etc. โœ…
Named Entity Recognition Medications, conditions, procedures, anatomy โœ…
Assertion Detection Negation, uncertainty, temporal, experiencer โœ…
UMLS Concept Mapping CUI normalization and semantic types โœ…
Relation Extraction Temporal and dosage relationships ๐Ÿ”„ v1.1
REST API Service FastAPI deployment wrapper ๐Ÿ”„ v1.1

๐Ÿ”ง Three Pipeline Types

# Full-featured (highest accuracy)
pipeline = pytakes.create_default_pipeline()

# Speed-optimized (fastest processing)  
pipeline = pytakes.create_fast_pipeline()

# Minimal (basic entity extraction)
pipeline = pytakes.create_basic_pipeline()

๐Ÿ’ป Command Line Interface

# Process single file
pytakes process note.txt --output results.json

# Batch processing
pytakes process notes/*.txt --output-dir results/

# Different pipelines and formats
pytakes process note.txt --pipeline fast --format xml
pytakes process note.txt --config custom_config.json

๐Ÿค Open Source Community

๐Ÿ‘ฅ Lead Contributors

  • Sonish Sivarajkumar - Lead Maintainer & Creator
    • Clinical NLP researcher and software engineer
    • Apache cTAKES community member
    • Python & healthcare technology enthusiast

๐ŸŒ Join Our Community

We're building the future of clinical NLP together! Whether you're a:

  • ๐Ÿ‘ฉโ€โš•๏ธ Clinician - Help us understand real-world clinical text challenges
  • ๐Ÿ‘จโ€๐Ÿ’ป Developer - Contribute code, fix bugs, or add new features
  • ๐Ÿ”ฌ Researcher - Share use cases, benchmarks, and domain expertise
  • ๐Ÿ“š Technical Writer - Improve documentation and tutorials
  • ๐ŸŽจ Designer - Enhance user experience and visualization

Everyone is welcome! Check out our Contributing Guide to get started.

๐Ÿ“ˆ Community Stats

  • Contributors: Growing community of clinical NLP enthusiasts
  • Issues: Active issue tracking and feature requests
  • Discussions: Technical discussions and use case sharing
  • Releases: Regular updates with new features and improvements

๐ŸŽฏ Ways to Contribute

๐Ÿ› Report Issues

  • Bug reports
  • Feature requests
  • Documentation issues
  • Performance problems

๐Ÿ’ก Share Ideas

  • New annotators
  • Pipeline improvements
  • Integration suggestions
  • Use case examples

๐Ÿ”ง Code Contributions

  • Bug fixes
  • New features
  • Performance optimizations
  • Test improvements

๐Ÿ“– Documentation

  • API documentation
  • Tutorials & guides
  • Example notebooks
  • Translation support

๐Ÿ“š Documentation & Resources


๐Ÿ—บ๏ธ Roadmap

๐ŸŽฏ v1.0 (Current) - Foundation

  • โœ… Core pipeline architecture
  • โœ… Clinical text processing (tokenization, NER, assertion)
  • โœ… UMLS concept mapping framework
  • โœ… CLI and Python APIs
  • โœ… Comprehensive documentation

โšก v1.1 (Next) - Enhancement

  • ๐Ÿ”„ Enhanced UMLS integration (QuickUMLS)
  • ๐Ÿ”„ Relation extraction (temporal, dosage)
  • ๐Ÿ”„ REST API service wrapper
  • ๐Ÿ”„ Docker containers & deployment guides
  • ๐Ÿ”„ Performance optimizations

๐Ÿš€ v2.0 (Future) - Intelligence

  • ๐Ÿ”ฎ LLM integration for disambiguation
  • ๐Ÿ”ฎ Active learning capabilities
  • ๐Ÿ”ฎ Advanced relation extraction
  • ๐Ÿ”ฎ Real-time processing pipelines
  • ๐Ÿ”ฎ Federated learning support

๐Ÿ† Why Open Source Matters in Healthcare

Healthcare technology should be:

  • ๐Ÿ” Transparent - Auditable algorithms for patient safety
  • ๐Ÿค Collaborative - Shared knowledge accelerates progress
  • โ™ฟ Accessible - No barriers to life-saving technology
  • ๐Ÿ”ง Customizable - Adaptable to diverse clinical environments
  • ๐Ÿ“ˆ Sustainable - Community-driven long-term maintenance

PyTAKES embodies these principles by providing enterprise-grade clinical NLP capabilities as a truly open source project. No hidden costs, no vendor dependencies, just powerful tools for advancing healthcare through technology.


๐Ÿ“„ License & Citation

๐Ÿ“œ License

PyTAKES is released under the MIT License - see LICENSE for details.

Copyright (c) 2025 Sonish Sivarajkumar and Contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
[Full license text in LICENSE file]

๐Ÿ“ Citation

If you use PyTAKES in your research, please cite:

@software{pytakes2025,
  title={PyTAKES: Open Source Python-native Clinical NLP Framework},
  author={Sivarajkumar, Sonish and Contributors},
  year={2025},
  url={https://github.com/sonishsivarajkumar/PyTAKES},
  version={1.0.0}
}

๐Ÿ™ Acknowledgments

PyTAKES builds upon the excellent work of:

  • Apache cTAKES - Pioneering clinical NLP framework
  • spaCy & Stanza - Modern NLP processing libraries
  • Clinical NLP Community - Researchers and practitioners advancing the field
  • Open Source Contributors - Everyone who helps make this project better

๐Ÿš€ Get Started Today!

# Install PyTAKES
pip install pytakes

# Clone the repository
git clone https://github.com/sonishsivarajkumar/PyTAKES.git
cd PyTAKES

# Try the examples
python examples/comprehensive_demo.py

Join us in revolutionizing clinical NLP! ๐ŸŽ‰

โญ Star this repo | ๐Ÿ“š Read the docs | ๐Ÿค Contribute | ๐Ÿ’ฌ Discuss

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyctakes-1.0.0.tar.gz (37.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyctakes-1.0.0-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file pyctakes-1.0.0.tar.gz.

File metadata

  • Download URL: pyctakes-1.0.0.tar.gz
  • Upload date:
  • Size: 37.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for pyctakes-1.0.0.tar.gz
Algorithm Hash digest
SHA256 360f2faa2fa1d9404dbfd697d73b84699872e05e48ba27e9ca5b31029b37c9cd
MD5 76257de4e6fe409e2fc87c75b11899ef
BLAKE2b-256 9b91053264c4a8d50321c395672f25aa1c26373e5a2aa8f07a00b7d26fe2539f

See more details on using hashes here.

File details

Details for the file pyctakes-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pyctakes-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for pyctakes-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b22bc146815495c468a5e6e2c608f29acfdfacdcbefedb284ebc586473196d5
MD5 0dfea03466cf608ae18230ccca7b28e6
BLAKE2b-256 c63e3586e01faf3b3aa88439dd55b1abd03da78d54a86f8611c49b37e71fcb9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page