Skip to main content

Extract Rett Syndrome mutations from genetic diagnosis report

Project description

RettX Mutation Analysis Library

PyPI version Python 3.8+ License: MIT

A Python library for extracting and analyzing MECP2 mutations from genetic documents using Azure AI services. Designed for medical genetics research, clinical applications, and bioinformatics workflows focused on Rett Syndrome.

🚀 Quick Start

Installation

pip install rettxmutation

Basic Usage

from rettxmutation import RettxServices, DefaultConfig

# Initialize configuration (loads from environment variables)
config = DefaultConfig()

# Create services
services = RettxServices(config)

# Process a genetic document
with open('genetic_report.pdf', 'rb') as file_stream:
    # Extract text using OCR
    document = services.ocr_service.extract_and_process_text(file_stream)
    
    # Detect MECP2-related keywords and mutations
    keyword_collection = services.keyword_detector_service.detect_keywords(document)
    
    # Get structured mutations with confidence scores
    mutations = keyword_collection.get_gene_mutations()
    
    for mutation in mutations:
        print(f"Gene: {mutation.primary_transcript.gene_id}")
        print(f"HGVS: {mutation.primary_transcript.hgvs_transcript_variant}")
        print(f"Protein: {mutation.primary_transcript.protein_consequence_tlr}")

✨ Key Features

  • 🔍 Multi-Format Document Processing: PDF, PNG, JPG support with intelligent OCR
  • 🧬 HGVS Validation: Built-in mutalyzer integration for mutation validation
  • 🤖 AI-Powered Extraction: Azure OpenAI and Semantic Kernel for intelligent text analysis
  • 📊 Confidence Scoring: All results include confidence metrics for quality assessment
  • 🎯 MECP2 Specialization: Optimized for Rett Syndrome genetic analysis
  • ⚡ Production Ready: Type-safe Pydantic v2 models with comprehensive error handling
  • 🔄 External API Integration: Ensembl.org enrichment with retry mechanisms
  • 🏗️ Modular Architecture: Clean separation of concerns with dependency injection

🛠️ Requirements

Python Version

  • Python 3.8 or higher

Azure Services

You'll need access to these Azure services:

  • Azure Form Recognizer (Document Intelligence) - for OCR
  • Azure OpenAI - for mutation extraction and text analysis
  • Azure Cognitive Services (Text Analytics) - for medical text processing
  • Azure AI Search (optional) - for enhanced keyword detection

Environment Variables

# Required
RETTX_DOCUMENT_ANALYSIS_ENDPOINT=https://your-form-recognizer.cognitiveservices.azure.com/
RETTX_DOCUMENT_ANALYSIS_KEY=your-form-recognizer-key

RETTX_OPENAI_ENDPOINT=https://your-openai.openai.azure.com/
RETTX_OPENAI_KEY=your-openai-key
RETTX_OPENAI_MODEL_NAME=gpt-4
RETTX_OPENAI_MODEL_VERSION=2024-02-01

RETTX_COGNITIVE_SERVICES_ENDPOINT=https://your-cognitive-services.cognitiveservices.azure.com/
RETTX_COGNITIVE_SERVICES_KEY=your-cognitive-services-key

# Optional (for enhanced features)
RETTX_AI_SEARCH_SERVICE=your-search-service
RETTX_AI_SEARCH_API_KEY=your-search-key
RETTX_AI_SEARCH_INDEX_NAME=your-index-name

📋 Processing Workflow

  1. Document Input: Accept PDF or image files
  2. OCR Processing: Extract text using Azure Form Recognizer
  3. Text Normalization: Clean and standardize extracted text
  4. Keyword Detection: Multi-layer detection (regex + AI + search)
  5. Mutation Extraction: AI-powered identification using Semantic Kernel
  6. HGVS Validation: Validate mutations using mutalyzer parser
  7. Data Enrichment: Query Ensembl.org for additional mutation data
  8. Structured Output: Return validated mutations with confidence scores

💻 Advanced Usage

Async Mutation Validation

import asyncio
from rettxmutation.models.gene_models import GeneMutation, TranscriptMutation
from rettxmutation.services.mutation_validator import MutationValidator

async def validate_mutations():
    config = DefaultConfig()
    validator = MutationValidator(config)
    
    # Create mutation object
    mutation = GeneMutation(
        variant_type="SNV",
        primary_transcript=TranscriptMutation(
            gene_id="MECP2",
            transcript_id="NM_004992.4",
            hgvs_transcript_variant="NM_004992.4:c.916C>T",
            protein_consequence_tlr="NP_004983.1:p.(Arg306Cys)"
        )
    )
    
    # Validate with external services
    validation_result = await validator.validate_mutations([mutation])
    return validation_result

# Run async function
result = asyncio.run(validate_mutations())

Custom Configuration

from rettxmutation.config import RettxConfig

class CustomConfig:
    """Custom configuration implementation"""
    LOG_LEVEL = "DEBUG"
    ENVIRONMENT = "production"
    RETTX_OPENAI_MODEL_NAME = "gpt-4-turbo"
    # ... other config values

services = RettxServices(CustomConfig())

🎯 Use Cases

  • 🏥 Clinical Genetics: Process genetic reports for patient diagnosis
  • 🔬 Research: Analyze genetic data for Rett Syndrome studies
  • 📊 Patient Registries: Populate genetic databases systematically
  • 🤖 Bioinformatics Pipelines: Integrate with existing analysis workflows
  • 📱 Clinical Applications: Build genetic analysis tools and dashboards

🔧 Error Handling & Reliability

  • Exponential Backoff: Automatic retry for external API calls (up to 5 attempts)
  • Graceful Degradation: Continue processing when optional services are unavailable
  • Comprehensive Logging: Detailed logging for debugging and monitoring
  • Type Safety: Pydantic v2 models ensure data validation at runtime
  • Connection Pooling: Efficient Azure service client management

🤝 Contributing

We welcome contributions! Please see our GitHub repository for:

  • Issue reporting
  • Feature requests
  • Pull request guidelines
  • Development setup instructions

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🆘 Support

🔮 Roadmap

  • Multi-Gene Support: Extend beyond MECP2 to other genetic conditions
  • Enhanced Image Processing: Advanced OCR for handwritten documents
  • Multilingual Support: Process documents in multiple languages
  • Real-time Processing: WebSocket support for live document analysis
  • Cloud Deployment: Docker containers and Azure deployment templates

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rettxmutation-0.1.12.tar.gz (74.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rettxmutation-0.1.12-py3-none-any.whl (80.3 kB view details)

Uploaded Python 3

File details

Details for the file rettxmutation-0.1.12.tar.gz.

File metadata

  • Download URL: rettxmutation-0.1.12.tar.gz
  • Upload date:
  • Size: 74.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rettxmutation-0.1.12.tar.gz
Algorithm Hash digest
SHA256 073413153af5c3395d09bf80b56bd8ab5b3ae60ab79410536dfd07058001281f
MD5 bd99c36d1eaca6d57a2089b4de205c59
BLAKE2b-256 2f1a7602ee6868b16a2b8e9dc6da1cce41bf315d4742ce0102b44c0ea89091bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for rettxmutation-0.1.12.tar.gz:

Publisher: publish_pypi.yml on rett-europe/rettxmutation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rettxmutation-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: rettxmutation-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 80.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for rettxmutation-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 ecf23b53309804143d0e988831fa07ef6afda5e102bf212d64d0279deb20d1b3
MD5 d6332d5bc897b83adfdaed6769f87d31
BLAKE2b-256 60fca9c2f0da56b61de95dcefcf076f9470a30b7eb00a67b1bf6964788e135d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for rettxmutation-0.1.12-py3-none-any.whl:

Publisher: publish_pypi.yml on rett-europe/rettxmutation

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page