Skip to main content

Extract Rett Syndrome mutations from genetic diagnosis report

Project description

RettX Mutation Analysis Library

PyPI version Python 3.8+ License: MIT

A Python library for extracting and analyzing MECP2 mutations from genetic documents using Azure AI services. Designed for medical genetics research, clinical applications, and bioinformatics workflows focused on Rett Syndrome.

🚀 Quick Start

Installation

pip install rettxmutation

Basic Usage

from rettxmutation import RettxServices, DefaultConfig

# Initialize configuration (loads from environment variables)
config = DefaultConfig()

# Create services
services = RettxServices(config)

# Process a genetic document
with open('genetic_report.pdf', 'rb') as file_stream:
    # Extract text using OCR
    document = services.ocr_service.extract_and_process_text(file_stream)
    
    # Detect MECP2-related keywords and mutations
    keyword_collection = services.keyword_detector_service.detect_keywords(document)
    
    # Get structured mutations with confidence scores
    mutations = keyword_collection.get_gene_mutations()
    
    for mutation in mutations:
        print(f"Gene: {mutation.primary_transcript.gene_id}")
        print(f"HGVS: {mutation.primary_transcript.hgvs_transcript_variant}")
        print(f"Protein: {mutation.primary_transcript.protein_consequence_tlr}")

✨ Key Features

  • 🔍 Multi-Format Document Processing: PDF, PNG, JPG support with intelligent OCR
  • 🧬 HGVS Validation: Built-in mutalyzer integration for mutation validation
  • 🤖 AI-Powered Extraction: Azure OpenAI and Semantic Kernel for intelligent text analysis
  • 📊 Confidence Scoring: All results include confidence metrics for quality assessment
  • 🎯 MECP2 Specialization: Optimized for Rett Syndrome genetic analysis
  • ⚡ Production Ready: Type-safe Pydantic v2 models with comprehensive error handling
  • 🔄 External API Integration: Ensembl.org enrichment with retry mechanisms
  • 🏗️ Modular Architecture: Clean separation of concerns with dependency injection

🛠️ Requirements

Python Version

  • Python 3.8 or higher

Azure Services

You'll need access to these Azure services:

  • Azure Form Recognizer (Document Intelligence) - for OCR
  • Azure OpenAI - for mutation extraction and text analysis
  • Azure Cognitive Services (Text Analytics) - for medical text processing
  • Azure AI Search (optional) - for enhanced keyword detection

Environment Variables

# Required
RETTX_DOCUMENT_ANALYSIS_ENDPOINT=https://your-form-recognizer.cognitiveservices.azure.com/
RETTX_DOCUMENT_ANALYSIS_KEY=your-form-recognizer-key

RETTX_OPENAI_ENDPOINT=https://your-openai.openai.azure.com/
RETTX_OPENAI_KEY=your-openai-key
RETTX_OPENAI_MODEL_NAME=gpt-4
RETTX_OPENAI_MODEL_VERSION=2024-02-01

RETTX_COGNITIVE_SERVICES_ENDPOINT=https://your-cognitive-services.cognitiveservices.azure.com/
RETTX_COGNITIVE_SERVICES_KEY=your-cognitive-services-key

# Optional (for enhanced features)
RETTX_AI_SEARCH_SERVICE=your-search-service
RETTX_AI_SEARCH_API_KEY=your-search-key
RETTX_AI_SEARCH_INDEX_NAME=your-index-name

📋 Processing Workflow

  1. Document Input: Accept PDF or image files
  2. OCR Processing: Extract text using Azure Form Recognizer
  3. Text Normalization: Clean and standardize extracted text
  4. Keyword Detection: Multi-layer detection (regex + AI + search)
  5. Mutation Extraction: AI-powered identification using Semantic Kernel
  6. HGVS Validation: Validate mutations using mutalyzer parser
  7. Data Enrichment: Query Ensembl.org for additional mutation data
  8. Structured Output: Return validated mutations with confidence scores

💻 Advanced Usage

Async Mutation Validation

import asyncio
from rettxmutation.models.gene_models import GeneMutation, TranscriptMutation
from rettxmutation.services.mutation_validator import MutationValidator

async def validate_mutations():
    config = DefaultConfig()
    validator = MutationValidator(config)
    
    # Create mutation object
    mutation = GeneMutation(
        variant_type="SNV",
        primary_transcript=TranscriptMutation(
            gene_id="MECP2",
            transcript_id="NM_004992.4",
            hgvs_transcript_variant="NM_004992.4:c.916C>T",
            protein_consequence_tlr="NP_004983.1:p.(Arg306Cys)"
        )
    )
    
    # Validate with external services
    validation_result = await validator.validate_mutations([mutation])
    return validation_result

# Run async function
result = asyncio.run(validate_mutations())

Custom Configuration

from rettxmutation.config import RettxConfig

class CustomConfig:
    """Custom configuration implementation"""
    LOG_LEVEL = "DEBUG"
    ENVIRONMENT = "production"
    RETTX_OPENAI_MODEL_NAME = "gpt-4-turbo"
    # ... other config values

services = RettxServices(CustomConfig())

🎯 Use Cases

  • 🏥 Clinical Genetics: Process genetic reports for patient diagnosis
  • 🔬 Research: Analyze genetic data for Rett Syndrome studies
  • 📊 Patient Registries: Populate genetic databases systematically
  • 🤖 Bioinformatics Pipelines: Integrate with existing analysis workflows
  • 📱 Clinical Applications: Build genetic analysis tools and dashboards

🔧 Error Handling & Reliability

  • Exponential Backoff: Automatic retry for external API calls (up to 5 attempts)
  • Graceful Degradation: Continue processing when optional services are unavailable
  • Comprehensive Logging: Detailed logging for debugging and monitoring
  • Type Safety: Pydantic v2 models ensure data validation at runtime
  • Connection Pooling: Efficient Azure service client management

🤝 Contributing

We welcome contributions! Please see our GitHub repository for:

  • Issue reporting
  • Feature requests
  • Pull request guidelines
  • Development setup instructions

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

🔮 Roadmap

  • Multi-Gene Support: Extend beyond MECP2 to other genetic conditions
  • Enhanced Image Processing: Advanced OCR for handwritten documents
  • Multilingual Support: Process documents in multiple languages
  • Real-time Processing: WebSocket support for live document analysis
  • Cloud Deployment: Docker containers and Azure deployment templates

Project details


Release history Release notifications | RSS feed

This version

0.2.2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rettxmutation-0.2.2.tar.gz (49.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rettxmutation-0.2.2-py3-none-any.whl (49.0 kB view details)

Uploaded Python 3

File details

Details for the file rettxmutation-0.2.2.tar.gz.

File metadata

  • Download URL: rettxmutation-0.2.2.tar.gz
  • Upload date:
  • Size: 49.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rettxmutation-0.2.2.tar.gz
Algorithm Hash digest
SHA256 2cab548972b7e949ca7f65a637aa92e697f579a90ab38ebab078563a1a6278fb
MD5 e2a503c3f4a905b651974afeef65c53e
BLAKE2b-256 33e39510044b9c946eb10a7781909c77634e841eccce38793a722866048d77e7

See more details on using hashes here.

Provenance

The following attestation bundles were made for rettxmutation-0.2.2.tar.gz:

Publisher: publish_pypi.yml on rett-europe/rettxmutation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rettxmutation-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: rettxmutation-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 49.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rettxmutation-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 84503d0ea1693e79e6c1214feacd1b7ecca7982e35022c734609bfa15975eb22
MD5 68ca95f5f65845268b7504c19f54ada2
BLAKE2b-256 f6e13f0a822895b5b8218a679c9eec89b2bd2c593798974b523b203cf6e8f271

See more details on using hashes here.

Provenance

The following attestation bundles were made for rettxmutation-0.2.2-py3-none-any.whl:

Publisher: publish_pypi.yml on rett-europe/rettxmutation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page