Skip to main content

A genomic analysis tool for sickle cell disease variants

Project description

SickleScope: Python Genomics Analysis Package

SickleScope is a Python package for sickle cell disease variant analysis that provides instant genetic risk assessment with visualisations. Built to simplify genetic variant analysis Built without navigating complex pipelines.

Installation

Prerequisites

  • Python 3.9 or higher
  • pip package manager
  • Virtual environment (recommended)

Option 1: Install from PyPI (Recommended)

pip install sickle-scope

Option 2: Development Installation

# Clone the repository
git clone https://github.com/talhahzubayer/sickle-scope.git
cd sickle-scope

# Create and activate virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e .

Option 3: Direct Installation from Source

# Download and install directly
pip install git+https://github.com/talhahzubayer/sickle-scope.git

Verify Installation

# Check if installation was successful
python -c "import sickle_scope; print('SickleScope installed successfully!')"

# View package information
python -m sickle_scope.cli info

Troubleshooting Installation

Common Issues

Python Version Error

# Check Python version (requires 3.9+)
python --version

# On some systems, use python3
python3 --version

Permission Errors

# Install for current user only
pip install --user sickle-scope

# Or use virtual environment (recommended)
python -m venv sickle_env
source sickle_env/bin/activate  # Windows: sickle_env\Scripts\activate
pip install sickle-scope

Dependency Conflicts

# Create clean environment
python -m venv clean_env
source clean_env/bin/activate
pip install --upgrade pip
pip install sickle-scope

Missing System Dependencies

# Ubuntu/Debian
sudo apt-get update
sudo apt-get install python3-dev python3-pip

# macOS with Homebrew
brew install python

# Windows: Download Python from python.org

Quick Start

# Analyse variants
sickle-analyse input.csv --output results/

# Generate comprehensive report
sickle-analyse input.csv --report --plot

# Quick validation
sickle-analyse validate input.csv

Architecture

Package Structure

sickle-scope/
├── sickle_scope/
│   ├── __init__.py
│   ├── cli.py                 # Command-line interface
│   ├── analyser.py            # Core analysis engine
│   ├── visualiser.py          # Plotting functions
│   ├── ml_models.py           # Machine learning models
│   ├── utils.py               # Performance optimisation utilities
│   └── data/                  # Reference databases
│       └── hbb_variants.json  # Curated HBB pathogenic variants and modifiers
├── notebooks/
│   ├── tutorial.ipynb      # Step-by-step guide
│   ├── examples.ipynb      # Sample analyses
│   └── advanced.ipynb      # Deep-dive analysis
├── tests/
│   ├── __init__.py
│   ├── test_analyser.py    # Core analysis tests
│   ├── test_cli.py         # CLI interface tests
│   ├── test_integration.py # Integration tests
│   ├── test_visualiser.py  # Visualisation tests
│   └── sample_data/        # Test input files
│       ├── test_variants.csv
│       ├── hbb_variants.csv
│       └── invalid_data.csv
├── results/                # Analysis outputs (created automatically)
│   ├── sickle_analysis.csv # Analysis results
│   ├── sickle_report.html  # HTML report
│   └── plots/              # Visualisation plots
├── run_tests.py            # Test runner script
├── setup.py
├── requirements.txt
└── README.md

Core Dependencies

  • Data Processing: pandas, numpy, pysam (optional)
  • Machine Learning: scikit-learn, scipy
  • Visualisation: matplotlib, seaborn, plotly
  • CLI Framework: click, rich

Features

Dual Interface Design

  • CLI: Perfect for automation and batch processing
  • Python API: Seamless integration into existing workflows
  • Jupyter Notebooks: Interactive exploration and learning

Analysis Pipeline

  • Genetic variant detection and classification
  • Risk scoring with weighted algorithms
  • Modifier gene analysis
  • Severity prediction using machine learning
  • Population comparison and statistics

Visualisation Suite

  • Risk score dashboards with gauge-style displays
  • Chromosomal variant position mapping
  • Genotype distribution charts
  • Severity prediction with confidence intervals
  • Interactive Plotly visualisations for Jupyter

Usage Examples

Command Line Interface

# Basic analysis (saves to current directory)
python -m sickle_scope.cli analyse variants.csv

# Organised output with reports and plots
python -m sickle_scope.cli analyse variants.csv \
  --output results/ \
  --report \
  --plot \
  --verbose

# Validate input data before analysis
python -m sickle_scope.cli validate variants.csv

# Get package information
python -m sickle_scope.cli info

# Full workflow with ML severity prediction
python -m sickle_scope.cli analyse tests/sample_data/hbb_variants.csv \
  --output my_analysis/ \
  --report \
  --plot \
  --predict-severity \
  --verbose \
  --config custom_params.json

# Interactive visualisation mode
python -m sickle_scope.cli analyse variants.csv \
  --output results/ \
  --interactive-plots \
  --population-compare \
  --manhattan-plot

Output Directory Structure

When using --output results/, SickleScope creates an organised directory structure:

results/
├── sickle_analysis.csv        # Main results file with variant classifications
├── severity_predictions.csv   # ML severity predictions
├── sickle_report.html         # Comprehensive HTML report (--report flag)
├── interactive_dashboard.html # Interactive Plotly dashboard
└── plots/                     # Visualisation directory (--plot flag)
    ├── risk_score_plot.png
    ├── variant_distribution.png
    ├── severity_prediction.png      # ML model outputs
    ├── population_comparison.png    # Population analysis plots
    ├── manhattan_plot.html          # Interactive Manhattan-style plot
    └── interactive/                 # Interactive Plotly visualisations
        ├── risk_dashboard.html
        ├── variant_explorer.html
        └── severity_heatmap.html

Python API

from sickle_scope import SickleAnalyser
from sickle_scope.ml_models import SeverityPredictor
from sickle_scope.visualiser import InteractiveVisualiser

# Initialise analyser
analyser = SickleAnalyser()

# Load and analyse data
results = analyser.analyse_csv('variants.csv')

# Generate basic visualisations
analyser.plot_risk_score(results)
analyser.plot_variant_distribution(results)

# Machine Learning - Severity Prediction
predictor = SeverityPredictor()
severity_predictions = predictor.predict_severity(results)
predictor.plot_severity_prediction(severity_predictions)

# Advanced Interactive Visualisations
interactive_visual = InteractiveVisualiser()
interactive_visual.create_plotly_dashboard(results, severity_predictions)
interactive_visual.plot_population_comparison(results)
interactive_visual.create_manhattan_style_plot(results)

# Export comprehensive results
results.to_csv('sickle_analysis.csv')
severity_predictions.to_csv('severity_predictions.csv')

Risk Scoring Algorithm

def calculate_risk_score(variants):
    """
    Weighted risk scoring based on:
    - HBB gene variants (60% weight)
    - BCL11A modifiers (20% weight)
    - Other modifiers (20% weight)
    """
    hbb_score = assess_hbb_variants(variants)
    modifier_score = assess_modifiers(variants)
    return (hbb_score * 0.6) + (modifier_score * 0.4)

Data Input Requirements

Supported Formats

  • CSV with variant data
  • TSV (tab-separated)
  • Excel files (.xlsx)
  • VCF file support (optional)

Required Columns

required_columns = [
    'chromosome',    # e.g., '11', 'chr11'
    'position',      # genomic position
    'ref_allele',    # reference nucleotide
    'alt_allele',    # alternate nucleotide
    'genotype'       # 0/0, 0/1, 1/1
]

Machine Learning Components

Severity Prediction Model

  • Algorithm: Random Forest Classifier
  • Features: Genetic variants + population data
  • Training Data: Literature-derived phenotype correlations
  • Output: Severity categories (Mild, Moderate, Severe)
from sklearn.ensemble import RandomForestClassifier

def train_severity_model(training_data):
    features = extract_features(training_data)
    labels = training_data['severity_category']
    model = RandomForestClassifier(n_estimators=100)
    model.fit(features, labels)
    return model

Built-in Reference Databases

reference_data = {
    'hbb_variants': 'data/hbb_variants.json'  # Primary reference database
}

The hbb_variants.json file contains a comprehensive collection of:

  • Pathogenic HBB variants: Including HbS (rs334), HbC, HbE and other clinically significant variants
  • Protective modifiers: BCL11A, KLF1, and other genetic factors that modify disease severity
  • Population frequencies: Allele frequencies across different populations (gnomAD, 1000 Genomes)
  • Clinical annotations: HGVS nomenclature, amino acid changes, pathogenicity scores
  • Metadata: Reference genome (GRCh38), data sources (ClinVar, OMIM, dbSNP), last updated in 8th September 2025

This database enables the package to work offline and provides standardised variant classification without requiring external API calls.

Interactive Notebooks

Tutorial.ipynb

Step-by-step learning guide with:

  • Data loading and preprocessing
  • Variant analysis workflow
  • Visualisation creation
  • Results interpretation

Examples.ipynb

Pre-loaded sample datasets demonstrating:

  • Multiple analysis scenarios
  • Different data formats
  • Advanced visualisation techniques

Advanced.ipynb

Complex analysis workflows including:

  • Custom risk algorithms
  • Population comparison studies
  • Machine learning model training and validation
  • Interactive Plotly dashboard creation
  • Severity prediction model optimisation
  • Advanced statistical analysis with interactive visualisations

Prerequisites

  • Python 3.9+
  • pip package manager
  • virtualenv (recommended)

Development Setup

# Clone repository
git clone https://github.com/talhahzubayer/sickle-scope.git
cd sickle-scope

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e .

# Run tests
python -m pytest tests/

Testing

# Run unit tests
python -m pytest tests/test_analyser.py

# Run integration tests with sample data
python -m pytest tests/ --integration

# Check code coverage
pytest --cov=sickle_scope tests/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sickle_scope-0.1.0.tar.gz (99.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sickle_scope-0.1.0-py3-none-any.whl (97.2 kB view details)

Uploaded Python 3

File details

Details for the file sickle_scope-0.1.0.tar.gz.

File metadata

  • Download URL: sickle_scope-0.1.0.tar.gz
  • Upload date:
  • Size: 99.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sickle_scope-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e69720015c80872457b0dad7f46dd362caae6fd0e49a7b9938d32fc8a784ec61
MD5 15041a984e1c7ea837f4435316721e78
BLAKE2b-256 190e5ca64e28815ec453b8b6a5f77e3ecec8823ead5738222fc7e3cf9597888c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sickle_scope-0.1.0.tar.gz:

Publisher: python-publish.yml on talhahzubayer/sickle-scope

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sickle_scope-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sickle_scope-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 97.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sickle_scope-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 81101a776d3e30bbf5ccd69fcb2e35639d11d45a43e66a3282cd2a7514523dc0
MD5 0b3645261bff31d83e773a57d3cb978f
BLAKE2b-256 a06faefe47140efa035568b537862c6250c545493a156aaa821a7dbcbf5ceb9e

See more details on using hashes here.

Provenance

The following attestation bundles were made for sickle_scope-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on talhahzubayer/sickle-scope

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page