Extract Rett Syndrome mutations from genetic diagnosis report
Project description
RettX Mutation Analysis Library
A Python library for extracting and analyzing MECP2 mutations from genetic documents using Azure AI services. Designed for medical genetics research, clinical applications, and bioinformatics workflows focused on Rett Syndrome.
🚀 Quick Start
Installation
pip install rettxmutation
Basic Usage
from rettxmutation import RettxServices, DefaultConfig
# Initialize configuration (loads from environment variables)
config = DefaultConfig()
# Create services
services = RettxServices(config)
# Process a genetic document
with open('genetic_report.pdf', 'rb') as file_stream:
# Extract text using OCR
document = services.ocr_service.extract_and_process_text(file_stream)
# Detect MECP2-related keywords and mutations
keyword_collection = services.keyword_detector_service.detect_keywords(document)
# Get structured mutations with confidence scores
mutations = keyword_collection.get_gene_mutations()
for mutation in mutations:
print(f"Gene: {mutation.primary_transcript.gene_id}")
print(f"HGVS: {mutation.primary_transcript.hgvs_transcript_variant}")
print(f"Protein: {mutation.primary_transcript.protein_consequence_tlr}")
✨ Key Features
- 🔍 Multi-Format Document Processing: PDF, PNG, JPG support with intelligent OCR
- 🧬 HGVS Validation: Built-in mutalyzer integration for mutation validation
- 🤖 AI-Powered Extraction: Azure OpenAI and Semantic Kernel for intelligent text analysis
- 📊 Confidence Scoring: All results include confidence metrics for quality assessment
- 🎯 MECP2 Specialization: Optimized for Rett Syndrome genetic analysis
- ⚡ Production Ready: Type-safe Pydantic v2 models with comprehensive error handling
- 🔄 External API Integration: Ensembl.org enrichment with retry mechanisms
- 🏗️ Modular Architecture: Clean separation of concerns with dependency injection
🛠️ Requirements
Python Version
- Python 3.8 or higher
Azure Services
You'll need access to these Azure services:
- Azure Form Recognizer (Document Intelligence) - for OCR
- Azure OpenAI - for mutation extraction and text analysis
- Azure Cognitive Services (Text Analytics) - for medical text processing
- Azure AI Search (optional) - for enhanced keyword detection
Environment Variables
# Required
RETTX_DOCUMENT_ANALYSIS_ENDPOINT=https://your-form-recognizer.cognitiveservices.azure.com/
RETTX_DOCUMENT_ANALYSIS_KEY=your-form-recognizer-key
RETTX_OPENAI_ENDPOINT=https://your-openai.openai.azure.com/
RETTX_OPENAI_KEY=your-openai-key
RETTX_OPENAI_MODEL_NAME=gpt-4
RETTX_OPENAI_MODEL_VERSION=2024-02-01
RETTX_COGNITIVE_SERVICES_ENDPOINT=https://your-cognitive-services.cognitiveservices.azure.com/
RETTX_COGNITIVE_SERVICES_KEY=your-cognitive-services-key
# Optional (for enhanced features)
RETTX_AI_SEARCH_SERVICE=your-search-service
RETTX_AI_SEARCH_API_KEY=your-search-key
RETTX_AI_SEARCH_INDEX_NAME=your-index-name
📋 Processing Workflow
- Document Input: Accept PDF or image files
- OCR Processing: Extract text using Azure Form Recognizer
- Text Normalization: Clean and standardize extracted text
- Keyword Detection: Multi-layer detection (regex + AI + search)
- Mutation Extraction: AI-powered identification using Semantic Kernel
- HGVS Validation: Validate mutations using mutalyzer parser
- Data Enrichment: Query Ensembl.org for additional mutation data
- Structured Output: Return validated mutations with confidence scores
💻 Advanced Usage
Async Mutation Validation
import asyncio
from rettxmutation.models.gene_models import GeneMutation, TranscriptMutation
from rettxmutation.services.mutation_validator import MutationValidator
async def validate_mutations():
config = DefaultConfig()
validator = MutationValidator(config)
# Create mutation object
mutation = GeneMutation(
variant_type="SNV",
primary_transcript=TranscriptMutation(
gene_id="MECP2",
transcript_id="NM_004992.4",
hgvs_transcript_variant="NM_004992.4:c.916C>T",
protein_consequence_tlr="NP_004983.1:p.(Arg306Cys)"
)
)
# Validate with external services
validation_result = await validator.validate_mutations([mutation])
return validation_result
# Run async function
result = asyncio.run(validate_mutations())
Custom Configuration
from rettxmutation.config import RettxConfig
class CustomConfig:
"""Custom configuration implementation"""
LOG_LEVEL = "DEBUG"
ENVIRONMENT = "production"
RETTX_OPENAI_MODEL_NAME = "gpt-4-turbo"
# ... other config values
services = RettxServices(CustomConfig())
🎯 Use Cases
- 🏥 Clinical Genetics: Process genetic reports for patient diagnosis
- 🔬 Research: Analyze genetic data for Rett Syndrome studies
- 📊 Patient Registries: Populate genetic databases systematically
- 🤖 Bioinformatics Pipelines: Integrate with existing analysis workflows
- 📱 Clinical Applications: Build genetic analysis tools and dashboards
🔧 Error Handling & Reliability
- Exponential Backoff: Automatic retry for external API calls (up to 5 attempts)
- Graceful Degradation: Continue processing when optional services are unavailable
- Comprehensive Logging: Detailed logging for debugging and monitoring
- Type Safety: Pydantic v2 models ensure data validation at runtime
- Connection Pooling: Efficient Azure service client management
🤝 Contributing
We welcome contributions! Please see our GitHub repository for:
- Issue reporting
- Feature requests
- Pull request guidelines
- Development setup instructions
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🆘 Support
- Issues: GitHub Issues
- Documentation: API Documentation
- Contact: procha@rettsyndrome.eu
🔮 Roadmap
- Multi-Gene Support: Extend beyond MECP2 to other genetic conditions
- Enhanced Image Processing: Advanced OCR for handwritten documents
- Multilingual Support: Process documents in multiple languages
- Real-time Processing: WebSocket support for live document analysis
- Cloud Deployment: Docker containers and Azure deployment templates
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rettxmutation-0.2.0.tar.gz.
File metadata
- Download URL: rettxmutation-0.2.0.tar.gz
- Upload date:
- Size: 73.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0f24fd251ff379a177af53945d84138d581e44d9831ab4b7554435592b8e09c
|
|
| MD5 |
e170f3f347b1f820fb23f7a465fa1bc0
|
|
| BLAKE2b-256 |
881bb2f804c1285b939a24ff2755fb78bc46522dd196f9cb223ffed72d382438
|
Provenance
The following attestation bundles were made for rettxmutation-0.2.0.tar.gz:
Publisher:
publish_pypi.yml on rett-europe/rettxmutation
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rettxmutation-0.2.0.tar.gz -
Subject digest:
b0f24fd251ff379a177af53945d84138d581e44d9831ab4b7554435592b8e09c - Sigstore transparency entry: 928393348
- Sigstore integration time:
-
Permalink:
rett-europe/rettxmutation@4f3138f85044f5e8bcc2d0cab11a2e2a08387378 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/rett-europe
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yml@4f3138f85044f5e8bcc2d0cab11a2e2a08387378 -
Trigger Event:
push
-
Statement type:
File details
Details for the file rettxmutation-0.2.0-py3-none-any.whl.
File metadata
- Download URL: rettxmutation-0.2.0-py3-none-any.whl
- Upload date:
- Size: 79.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efa11aa03092b1aa23bafd1dbd50e36d5f259438e3ca74af2f214be76eba589d
|
|
| MD5 |
6d1ffe597e6d3f71d501ff096669ae32
|
|
| BLAKE2b-256 |
45d3dee2cd0e207b6bac2ccd17463280325db00911df947b43858a93a6f1b2af
|
Provenance
The following attestation bundles were made for rettxmutation-0.2.0-py3-none-any.whl:
Publisher:
publish_pypi.yml on rett-europe/rettxmutation
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rettxmutation-0.2.0-py3-none-any.whl -
Subject digest:
efa11aa03092b1aa23bafd1dbd50e36d5f259438e3ca74af2f214be76eba589d - Sigstore transparency entry: 928393352
- Sigstore integration time:
-
Permalink:
rett-europe/rettxmutation@4f3138f85044f5e8bcc2d0cab11a2e2a08387378 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/rett-europe
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish_pypi.yml@4f3138f85044f5e8bcc2d0cab11a2e2a08387378 -
Trigger Event:
push
-
Statement type: