No project description provided

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

skiml-cluster

Table of Contents

Quickstart
Installation
License

Quickstart

For RIVM users, the best approach is to run skiml-cluster on a computing node of the HPC, instead of on the home node.

Step by step:

Run apollo-mapping on the samples you want to cluster, or find the apollo-mapping results of these samples.
Copy all SNP VCFs (<APOLLO-MAPPING-OUTPUT>/variants/snps/*.vcf) for the relevant samples to a single directory.
Run skiml-cluster as follows on the SNP VCF directory, which will submit a job to the HPC:

bsub -M 10G skiml-cluster run --input <INPUT FOLDER> --output <OUTPUT FOLDER>

It is also possible to run skiml-cluster locally (without submitting to the HPC):

skiml-cluster run --input <INPUT FOLDER> --output <OUTPUT FOLDER>

However, please do this with care as some steps might require a lot of resources if you're running a lot of samples.

Installation

pip install --user skiml-cluster

To update skiml-cluster to the most recent version, run:

pip install --user --upgrade skiml-cluster

Troubleshooting

Q: I get this error:

ValueError in file /home/boas/mambaforge/lib/python3.10/site-packages/skiml_cluster/workflow/Snakefile, line 16:
No VCF files found in input directory.

A: skiml-cluster looks for files ending in .vcf in the input folder. If zero files can be found with this exact extension, the pipeline will not run. Please make sure there are valid VCF files (preferably from apollo-mapping) in the input folder with the extension .vcf.

Q: I get this error:

ValueError in file /home/boas/mambaforge/lib/python3.10/site-packages/skiml_cluster/workflow/Snakefile, line 11:
Input directory does not exist.

A: The input directory cannot be found. Please make sure there are no typos in the skiml-cluster command.

Methodology

skiml-cluster does a couple of things, which is managed by Snakemake.

SNP VCF files are compressed and indexed (required for other analyses).
All indexed VCF files are merged into a multi-sample VCF.
A UPGMA tree is generated from the multi-sample VCF using vcfkit.
A Fasta pseudo-alignment is generated from the multi-sample VCF using vcfkit.
SNP distances are calculated from the pseudo-alignment using snp-dists.

A command line interface of the Snakemake pipeline has been generated using snk.

License

skiml-cluster is distributed under the terms of the MIT license.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.6

Dec 11, 2023

0.0.5

Dec 11, 2023

0.0.4

Dec 7, 2023

0.0.3

Nov 10, 2023

0.0.2

Oct 24, 2023

0.0.1

Oct 24, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skiml_cluster-0.0.6.tar.gz (5.5 kB view hashes)

Uploaded Dec 11, 2023 Source

Built Distribution

skiml_cluster-0.0.6-py3-none-any.whl (6.7 kB view hashes)

Uploaded Dec 11, 2023 Python 3

Hashes for skiml_cluster-0.0.6.tar.gz

Hashes for skiml_cluster-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`e91dc81572aca300ae0eae548491a1187b93646023faec29e2338daed224a46a`
MD5	`6836ec3d75fe0fde363fb4629b8f5c58`
BLAKE2b-256	`959e3e120c0c208c967bf3e15eee596ac03561c57c13ab3d6a5ba5c48874b395`

Hashes for skiml_cluster-0.0.6-py3-none-any.whl

Hashes for skiml_cluster-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b081ca50835fe263c54b1cf931d9a88f3b428c0f53de3abcd5f58027d93fd91b`
MD5	`7130ce52af327103e86a369c11ba7689`
BLAKE2b-256	`02903a7d8b1c48bb44352fae5b4e24a119ade9006e6ad72fd69ab4932731ee13`