No project description provided
Project description
Migenpro
Getting started
Pull the git repo:
git pull git@gitlab.com:pig-paradigm/migenpro.git
cd migenpro
Installing the needed dependencies.
A pip requirements.txt file is located in the installation directory which you can install using the following command.
conda create -n migenpro python=3.12.5 pip --file installation/requirements.txt
Annotating genomes using SAPP
To annotate genomes we use a cwltool workflow with SAPP that output the desired genome annotations in hdt files.
cwltool --no-warnings --outdir ./data https://gitlab.com/m-unlock/cwl/-/raw/dev/workflows/workflow_microbial_annotation.cwl --genome_fasta https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/005/845/GCA_000005845.2_ASM584v2/GCA_000005845.2_ASM584v2_genomic.fna.gz
Luckily we have automated this process within the python package.
Training machine learning models
python3 src/main/resources/python/machineLearning.py \
--featureMatrix ./output/phentype_matrix.tsv \
--phenotypeMatrix output/protein_domain_matrix.tsv \
--model_load [Location_of_model] \
--train
--predict
Predicting phenotypes with existing models
You can do this through the docker container or from the source code.
- You will need to obtain a protein domain matrix of the desired genomes you can do this using the java code.
- For ease of use we will use the python scripts that were made with the following command. The default output directory is "output/mloutput" if desired you can change this using the --output [output_directory_location]
python3 src/main/resources/python/machineLearning.py \
--featureMatrix ./output/phentype_matrix.tsv \
--model_load [Location_of_model] \
--predict
- Wait for the script to finish and retrieve the results of your prediction from the output directory. There the predictions are given in the following format:
################################################
# Genome # Phenotype # Prediction # Confidence #
# GCA123 # Temprature # mesophilic # 0.96 #
################################################
Recreating the results from the study
The files needed to recreate our results are located in the ./data/phenotype_output folder. We use the previously created protein_domain.tsv and phenotype.tsv files.
Run the create_graphs.sh bash script
./recreate.sh
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file migenpro-0.1.0.tar.gz.
File metadata
- Download URL: migenpro-0.1.0.tar.gz
- Upload date:
- Size: 38.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb65863ca4d03afa2fb7d94ef1b4c4659584156cd7666c932f1ef7ed98a3c39b
|
|
| MD5 |
4355068f363a133904ac3b0eae462925
|
|
| BLAKE2b-256 |
dc2c432075624827d1be5f7b89c6c8131d2b459e142c33d8b8d9fd2ff89e3067
|
File details
Details for the file migenpro-0.1.0-py3-none-any.whl.
File metadata
- Download URL: migenpro-0.1.0-py3-none-any.whl
- Upload date:
- Size: 50.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
401e321c9db5a22aefbc0e63987eb1d7a9901c1ed79e3f72bf5c073d9a1fa83b
|
|
| MD5 |
f9925bfccfd4f7c084b93fd5c05945ef
|
|
| BLAKE2b-256 |
5f416e0556fe4149109c5846d0962295a7f8bf91f4dc8c077bdd01af728fb4de
|