Skip to main content

A machine learning pipeline for variant prioritisation using Exomiser output.

Project description

Exomiser ML

Exomiser-ML is a Python-based machine learning pipeline designed to enhance variant prioritisation by integrating various Exomiser scores. It supports classification using Logistic Regression, Random Forest, and XGBoost, and includes utilities for feature extraction, data splitting, model training, and post-processing.

🚀 Features

  • Extracts features from Exomiser variant TSV result files and Phenopackets.
  • Supports multiple classifiers: Logistic Regression, Random Forest, and XGBoost.
  • Provides CLI commands for training, and full pipeline execution.
  • Generates metadata and post-processed results compatible with PhEval for benchmarking.

📦 Installation

Ensure you have Python 3.12 or higher installed.

pip install exomiser-ml

🧪 Usage

The package provides several CLI commands:

  1. run-model

Trains and evaluates a model using provided training data and test directory.

run-model \
  --training-data path/to/train.tsv \
  --test-dir path/to/test_dir \
  --features FEATURE1 FEATURE2 ... \
  --output-dir path/to/output \
  --phenopacket-dir path/to/phenopackets \
  --model MODEL_TYPE

Parameters:

  • --training-data: Path to the training data TSV file.
  • --test-dir: Directory containing test data files.
  • --features: List of features to extract.
  • --output-dir: Directory to save outputs.
  • --phenopacket-dir: Directory containing Phenopacket JSON files.
  • --model: Model type to use. Choices: LOGISTIC_REGRESSION, RANDOM_FOREST, XGBOOST_CLASSIFIER
  1. run-pipeline

Executes the full pipeline: feature extraction, data splitting, training, evaluation, and post-processing.

run-pipeline \
  --phenopacket-dir path/to/phenopackets \
  --result-dir path/to/exomiser_results \
  --output-dir path/to/output \
  --features FEATURE1 FEATURE2 ... \
  --test-size 0.2 \
  --model MODEL_TYPE

Parameters:

  • --phenopacket-dir: Directory containing Phenopacket JSON files.
  • --result-dir: Directory containing Exomiser result TSV files.
  • --output-dir: Directory to save outputs.
  • --features: List of features to extract.
  • --test-size: Proportion of data to use for testing (e.g., 0.2 for 20%).
  • --model: Model type to use. Choices: LOGISTIC_REGRESSION, RANDOM_FOREST, XGBOOST_CLASSIFIER.
  1. add-features

Adds features to Exomiser results.

add-features \
  --phenopacket-dir path/to/phenopackets \
  --result-dir path/to/exomiser_results \
  --output-dir path/to/output
  1. split-data

Splits data (Exomiser TSV results) into training and testing sets.

split-data \
  --input-dir path/to/input_data \
  --test-size 0.2 \
  --output-dir path/to/output
  1. post-process

Post-processes test results for downstream benchmarking with PhEval.

post-process \
  --test-dir path/to/test_results \
  --phenopacket-dir path/to/phenopackets \
  --output-dir path/to/output \
  --score NEW_SCORE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exomiser_ml-0.1.17.tar.gz (12.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exomiser_ml-0.1.17-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file exomiser_ml-0.1.17.tar.gz.

File metadata

  • Download URL: exomiser_ml-0.1.17.tar.gz
  • Upload date:
  • Size: 12.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for exomiser_ml-0.1.17.tar.gz
Algorithm Hash digest
SHA256 45cca13cd697b86b4ec93966b99c91590ea3fc87ce596bb78811febcad2044dd
MD5 ad8748849c732682f71812ac7679ca12
BLAKE2b-256 54645c1df7e1b5ecef7421e03c4d3bd9f5bd0b75367d890d356da26d66974411

See more details on using hashes here.

File details

Details for the file exomiser_ml-0.1.17-py3-none-any.whl.

File metadata

  • Download URL: exomiser_ml-0.1.17-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for exomiser_ml-0.1.17-py3-none-any.whl
Algorithm Hash digest
SHA256 1a108ac3c917703c132c6b1834c7ab2f3a9352bca05e7ae05fbbad3e25946664
MD5 a4d02c72d5d507cfeebbc8b507334dc0
BLAKE2b-256 806d5a290b9c611d9ae068befea3fe31cbf815563c548bd48d71296b0dc0336f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page