Skip to main content

Extract Rett Syndrome mutations from genetic diagnosis report

Project description

RettX Mutation Analysis Library

PyPI version Python 3.8+ License: MIT

A Python library for extracting and analyzing MECP2 mutations from genetic documents using Azure AI services. Designed for medical genetics research, clinical applications, and bioinformatics workflows focused on Rett Syndrome.

🚀 Quick Start

Installation

pip install rettxmutation

Basic Usage

from rettxmutation import RettxServices, DefaultConfig

# Initialize configuration (loads from environment variables)
config = DefaultConfig()

# Create services
services = RettxServices(config)

# Process a genetic document
with open('genetic_report.pdf', 'rb') as file_stream:
    # Extract text using OCR
    document = services.ocr_service.extract_and_process_text(file_stream)
    
    # Detect MECP2-related keywords and mutations
    keyword_collection = services.keyword_detector_service.detect_keywords(document)
    
    # Get structured mutations with confidence scores
    mutations = keyword_collection.get_gene_mutations()
    
    for mutation in mutations:
        print(f"Gene: {mutation.primary_transcript.gene_id}")
        print(f"HGVS: {mutation.primary_transcript.hgvs_transcript_variant}")
        print(f"Protein: {mutation.primary_transcript.protein_consequence_tlr}")

✨ Key Features

  • 🔍 Multi-Format Document Processing: PDF, PNG, JPG support with intelligent OCR
  • 🧬 HGVS Validation: Built-in mutalyzer integration for mutation validation
  • 🤖 AI-Powered Extraction: Azure OpenAI and Semantic Kernel for intelligent text analysis
  • 📊 Confidence Scoring: All results include confidence metrics for quality assessment
  • 🎯 MECP2 Specialization: Optimized for Rett Syndrome genetic analysis
  • ⚡ Production Ready: Type-safe Pydantic v2 models with comprehensive error handling
  • 🔄 External API Integration: Ensembl.org enrichment with retry mechanisms
  • 🏗️ Modular Architecture: Clean separation of concerns with dependency injection

🛠️ Requirements

Python Version

  • Python 3.8 or higher

Azure Services

You'll need access to these Azure services:

  • Azure Form Recognizer (Document Intelligence) - for OCR
  • Azure OpenAI - for mutation extraction and text analysis
  • Azure Cognitive Services (Text Analytics) - for medical text processing
  • Azure AI Search (optional) - for enhanced keyword detection

Environment Variables

# Required
RETTX_DOCUMENT_ANALYSIS_ENDPOINT=https://your-form-recognizer.cognitiveservices.azure.com/
RETTX_DOCUMENT_ANALYSIS_KEY=your-form-recognizer-key

RETTX_OPENAI_ENDPOINT=https://your-openai.openai.azure.com/
RETTX_OPENAI_KEY=your-openai-key
RETTX_OPENAI_MODEL_NAME=gpt-4
RETTX_OPENAI_MODEL_VERSION=2024-02-01

RETTX_COGNITIVE_SERVICES_ENDPOINT=https://your-cognitive-services.cognitiveservices.azure.com/
RETTX_COGNITIVE_SERVICES_KEY=your-cognitive-services-key

# Optional (for enhanced features)
RETTX_AI_SEARCH_SERVICE=your-search-service
RETTX_AI_SEARCH_API_KEY=your-search-key
RETTX_AI_SEARCH_INDEX_NAME=your-index-name

📋 Processing Workflow

  1. Document Input: Accept PDF or image files
  2. OCR Processing: Extract text using Azure Form Recognizer
  3. Text Normalization: Clean and standardize extracted text
  4. Keyword Detection: Multi-layer detection (regex + AI + search)
  5. Mutation Extraction: AI-powered identification using Semantic Kernel
  6. HGVS Validation: Validate mutations using mutalyzer parser
  7. Data Enrichment: Query Ensembl.org for additional mutation data
  8. Structured Output: Return validated mutations with confidence scores

💻 Advanced Usage

Async Mutation Validation

import asyncio
from rettxmutation.models.gene_models import GeneMutation, TranscriptMutation
from rettxmutation.services.mutation_validator import MutationValidator

async def validate_mutations():
    config = DefaultConfig()
    validator = MutationValidator(config)
    
    # Create mutation object
    mutation = GeneMutation(
        variant_type="SNV",
        primary_transcript=TranscriptMutation(
            gene_id="MECP2",
            transcript_id="NM_004992.4",
            hgvs_transcript_variant="NM_004992.4:c.916C>T",
            protein_consequence_tlr="NP_004983.1:p.(Arg306Cys)"
        )
    )
    
    # Validate with external services
    validation_result = await validator.validate_mutations([mutation])
    return validation_result

# Run async function
result = asyncio.run(validate_mutations())

Custom Configuration

from rettxmutation.config import RettxConfig

class CustomConfig:
    """Custom configuration implementation"""
    LOG_LEVEL = "DEBUG"
    ENVIRONMENT = "production"
    RETTX_OPENAI_MODEL_NAME = "gpt-4-turbo"
    # ... other config values

services = RettxServices(CustomConfig())

🎯 Use Cases

  • 🏥 Clinical Genetics: Process genetic reports for patient diagnosis
  • 🔬 Research: Analyze genetic data for Rett Syndrome studies
  • 📊 Patient Registries: Populate genetic databases systematically
  • 🤖 Bioinformatics Pipelines: Integrate with existing analysis workflows
  • 📱 Clinical Applications: Build genetic analysis tools and dashboards

🔧 Error Handling & Reliability

  • Exponential Backoff: Automatic retry for external API calls (up to 5 attempts)
  • Graceful Degradation: Continue processing when optional services are unavailable
  • Comprehensive Logging: Detailed logging for debugging and monitoring
  • Type Safety: Pydantic v2 models ensure data validation at runtime
  • Connection Pooling: Efficient Azure service client management

🤝 Contributing

We welcome contributions! Please see our GitHub repository for:

  • Issue reporting
  • Feature requests
  • Pull request guidelines
  • Development setup instructions

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

🔮 Roadmap

  • Multi-Gene Support: Extend beyond MECP2 to other genetic conditions
  • Enhanced Image Processing: Advanced OCR for handwritten documents
  • Multilingual Support: Process documents in multiple languages
  • Real-time Processing: WebSocket support for live document analysis
  • Cloud Deployment: Docker containers and Azure deployment templates

Project details


Release history Release notifications | RSS feed

This version

0.2.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rettxmutation-0.2.0.tar.gz (73.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rettxmutation-0.2.0-py3-none-any.whl (79.8 kB view details)

Uploaded Python 3

File details

Details for the file rettxmutation-0.2.0.tar.gz.

File metadata

  • Download URL: rettxmutation-0.2.0.tar.gz
  • Upload date:
  • Size: 73.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rettxmutation-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b0f24fd251ff379a177af53945d84138d581e44d9831ab4b7554435592b8e09c
MD5 e170f3f347b1f820fb23f7a465fa1bc0
BLAKE2b-256 881bb2f804c1285b939a24ff2755fb78bc46522dd196f9cb223ffed72d382438

See more details on using hashes here.

Provenance

The following attestation bundles were made for rettxmutation-0.2.0.tar.gz:

Publisher: publish_pypi.yml on rett-europe/rettxmutation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rettxmutation-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: rettxmutation-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 79.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rettxmutation-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 efa11aa03092b1aa23bafd1dbd50e36d5f259438e3ca74af2f214be76eba589d
MD5 6d1ffe597e6d3f71d501ff096669ae32
BLAKE2b-256 45d3dee2cd0e207b6bac2ccd17463280325db00911df947b43858a93a6f1b2af

See more details on using hashes here.

Provenance

The following attestation bundles were made for rettxmutation-0.2.0-py3-none-any.whl:

Publisher: publish_pypi.yml on rett-europe/rettxmutation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page