Skip to main content

No project description provided

Project description

Migenpro

Coverage codequality

Getting started

Pull the git repo:

git pull git@gitlab.com:pig-paradigm/migenpro.git
cd migenpro

Installing the needed dependencies.

A pip requirements.txt file is located in the installation directory which you can install using the following command.

conda create -n migenpro python=3.12.5 pip --file installation/requirements.txt

Annotating genomes using SAPP

To annotate genomes we use a cwltool workflow with SAPP that output the desired genome annotations in hdt files.

cwltool --no-warnings --outdir ./data https://gitlab.com/m-unlock/cwl/-/raw/dev/workflows/workflow_microbial_annotation.cwl --genome_fasta https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/005/845/GCA_000005845.2_ASM584v2/GCA_000005845.2_ASM584v2_genomic.fna.gz

Luckily we have automated this process within the python package.

Training machine learning models

python3 src/main/resources/python/machineLearning.py \
    --featureMatrix ./output/phentype_matrix.tsv \
    --phenotypeMatrix output/protein_domain_matrix.tsv \
    --model_load [Location_of_model] \
    --train
    --predict

Predicting phenotypes with existing models

You can do this through the docker container or from the source code.

  1. You will need to obtain a protein domain matrix of the desired genomes you can do this using the java code.
  2. For ease of use we will use the python scripts that were made with the following command. The default output directory is "output/mloutput" if desired you can change this using the --output [output_directory_location]
python3 src/main/resources/python/machineLearning.py \
    --featureMatrix ./output/phentype_matrix.tsv \
    --model_load [Location_of_model] \
    --predict
  1. Wait for the script to finish and retrieve the results of your prediction from the output directory. There the predictions are given in the following format:
################################################
# Genome # Phenotype # Prediction # Confidence #
# GCA123 # Temprature # mesophilic # 0.96      #
################################################

Recreating the results from the study

The files needed to recreate our results are located in the ./data/phenotype_output folder. We use the previously created protein_domain.tsv and phenotype.tsv files. Run the create_graphs.sh bash script

./recreate.sh

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

migenpro-0.1.0.tar.gz (38.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

migenpro-0.1.0-py3-none-any.whl (50.3 kB view details)

Uploaded Python 3

File details

Details for the file migenpro-0.1.0.tar.gz.

File metadata

  • Download URL: migenpro-0.1.0.tar.gz
  • Upload date:
  • Size: 38.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for migenpro-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eb65863ca4d03afa2fb7d94ef1b4c4659584156cd7666c932f1ef7ed98a3c39b
MD5 4355068f363a133904ac3b0eae462925
BLAKE2b-256 dc2c432075624827d1be5f7b89c6c8131d2b459e142c33d8b8d9fd2ff89e3067

See more details on using hashes here.

File details

Details for the file migenpro-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: migenpro-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for migenpro-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 401e321c9db5a22aefbc0e63987eb1d7a9901c1ed79e3f72bf5c073d9a1fa83b
MD5 f9925bfccfd4f7c084b93fd5c05945ef
BLAKE2b-256 5f416e0556fe4149109c5846d0962295a7f8bf91f4dc8c077bdd01af728fb4de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page