Skip to main content

Lineage prediction from SARS-CoV-2 sequences

Project description

Armadillin

This is an experimental tool under development. The recommended method for calling lineages remains normal Pangolin: https://github.com/cov-lineages/pangolin

A Re-engineered Method Allowing DetermInation of viraL LINeages

Armadillin is an experimental alternative approach to training models on lineages designated by the PANGO team.

Armadillin uses dense neural networks for assignment, which means it doesn't have to assume that positions with an N are the reference sequence. Armadillin is still very fast, in part because it sparsifies the feature input to this neural net during training.

Installation (for inference)

conda create --name armadillin python=3.9
conda activate armadillin
pip3 install armadillin

Usage

You must already have aligned your files to the reference (doing this automatically is on the backlist).

We'll use the COG-UK aligned file for a demo:

wget https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/cog_alignment.fasta.gz
armadillin https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/cog_alignment.fasta.gz

or

armadillin https://cog-uk.s3.climb.ac.uk/phylogenetics/latest/cog_alignment.fasta.gz > output.tsv

Training your own models

Dataset generation

python -m armadillin.training_make_input --designations ~/gisaid/pango-designation-1.2.88/ --gisaid_meta_file ~/gisaid/metadata.tsv --gisaid_mmsa ~/gisaid/msa_2021-10-20.tar.xz --output ~/training_set_nov_02
 python -m armadillin.train --shard_dir /home/theo/training_set_nov_02 --use_wandb --checkpoint_path ~/nov2check1

 python -m armadillin.train --starting_model ~/nov2check1/checkpoint.h5 --use_wandb --checkpoint_path ~/nov2check1_sparse/ --do_pruning --shard_dir /home/theo/training_set_nov_02

 python -m armadillin.training_create_small_model -i /tmp/model_zeros.h5 -d  /home/theo/training_set_nov_02

Related tools

Pangolin is the OG for assigning lineages

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

armadillin-0.22.tar.gz (21.3 MB view details)

Uploaded Source

Built Distribution

armadillin-0.22-py3-none-any.whl (21.3 MB view details)

Uploaded Python 3

File details

Details for the file armadillin-0.22.tar.gz.

File metadata

  • Download URL: armadillin-0.22.tar.gz
  • Upload date:
  • Size: 21.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for armadillin-0.22.tar.gz
Algorithm Hash digest
SHA256 8d1b9600d42f24fbc22789c2a78cba49d61a76bbae954962a296dc31396bee7f
MD5 56b8ace9ae694b9e9bdc78430f7a4b9e
BLAKE2b-256 096ac212967538252842776544d85a34db9bb3482e4f3aea41c5d1fcfea1aecb

See more details on using hashes here.

Provenance

File details

Details for the file armadillin-0.22-py3-none-any.whl.

File metadata

  • Download URL: armadillin-0.22-py3-none-any.whl
  • Upload date:
  • Size: 21.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.5.0 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for armadillin-0.22-py3-none-any.whl
Algorithm Hash digest
SHA256 bad5e8c14a95983cbb4c21b74e4c543061e51a09ad957e2959d04a07f40228cf
MD5 6d2bf5d8414d5e5caee4536e48a30b04
BLAKE2b-256 3197af43730db7081e62dcb6aff0145a315065282e3513bfb46e706b1a71522a

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page