Lineage prediction from SARS-CoV-2 sequences
Project description
Armadillin
This is an experimental tool under development. The recommended method for calling lineages remains normal Pangolin: https://github.com/cov-lineages/pangolin
A Re-engineered Method Allowing DetermInation of viraL LINeages
Armadillin is an experimental alternative approach to training models on lineages designated by the PANGO team.
Armadillin uses dense neural networks for assignment, which means it doesn't have to assume that positions with an N are the reference sequence. Armadillin is still very fast, in part because it sparsifies the feature input to this neural net during training.
Installation (for inference)
conda create --name armadillin python=3.9
conda activate armadillin
pip3 install armadillin
Usage
You must already have aligned your files to the reference (doing this automatically is on the backlist).
We'll use the COG-UK aligned file for a demo:
wget https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/cog_alignment.fasta.gz
armadillin https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/cog_alignment.fasta.gz
or
armadillin https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/cog_alignment.fasta.gz > output.tsv
Training your own models
Dataset generation
python -m armadillin.training_make_input --designations ~/gisaid/pango-designation-1.2.88/ --gisaid_meta_file ~/gisaid/metadata.tsv --gisaid_mmsa ~/gisaid/msa_2021-10-20.tar.xz --output ~/training_set_nov_02
python -m armadillin.train --shard_dir /home/theo/training_set_nov_02 --use_wandb --checkpoint_path ~/nov2check1
python -m armadillin.train --starting_model ~/nov2check1/checkpoint.h5 --use_wandb --checkpoint_path ~/nov2check1_sparse/ --do_pruning --shard_dir /home/theo/training_set_nov_02
python -m armadillin.training_create_small_model -i /tmp/model_zeros.h5 -d /home/theo/training_set_nov_02
Related tools
Pangolin is the OG for assigning lineages
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file armadillin-0.40.tar.gz
.
File metadata
- Download URL: armadillin-0.40.tar.gz
- Upload date:
- Size: 21.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7b7cecbe388e67b07748ddf187eeb4647ce47ee10f6ac807e43040a6ff34fb1c |
|
MD5 | c81ef1e3199615348aa9573567ab741c |
|
BLAKE2b-256 | 28dd6858e13d6b9ad33418b286132ff403f59f4f0289eca3eb63e10729779ab0 |
Provenance
File details
Details for the file armadillin-0.40-py3-none-any.whl
.
File metadata
- Download URL: armadillin-0.40-py3-none-any.whl
- Upload date:
- Size: 21.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a605c43c7667fe30c4a058f029811c043417f57178baed41663696d99ac47d06 |
|
MD5 | dd0e5263f0a93d667e930c7714c0ad43 |
|
BLAKE2b-256 | 45f79b0584a779fa14bf35a672b42e3a8b37f046c44d3afb3b19dab7ef2cf849 |