Skip to main content

No project description provided

Project description

skiml-cluster

PyPI - Version PyPI - Python Version


Table of Contents

Quickstart

For RIVM users, the best approach is to run skiml-cluster on a computing node of the HPC, instead of on the home node.

Step by step:

  1. Run apollo-mapping on the samples you want to cluster, or find the apollo-mapping results of these samples.
  2. Copy all SNP VCFs (<APOLLO-MAPPING-OUTPUT>/variants/snps/*.vcf) for the relevant samples to a single directory.
  3. Run skiml-cluster as follows on the SNP VCF directory, which will submit a job to the HPC:
bsub -M 10G skiml-cluster run --input <INPUT FOLDER> --output <OUTPUT FOLDER>

It is also possible to run skiml-cluster locally (without submitting to the HPC):

skiml-cluster run --input <INPUT FOLDER> --output <OUTPUT FOLDER>

However, please do this with care as some steps might require a lot of resources if you're running a lot of samples.

Installation

pip install --user skiml-cluster

To update skiml-cluster to the most recent version, run:

pip install --user --upgrade skiml-cluster

Troubleshooting

Q: I get this error:

ValueError in file /home/boas/mambaforge/lib/python3.10/site-packages/skiml_cluster/workflow/Snakefile, line 16:
No VCF files found in input directory.

A: skiml-cluster looks for files ending in .vcf in the input folder. If zero files can be found with this exact extension, the pipeline will not run. Please make sure there are valid VCF files (preferably from apollo-mapping) in the input folder with the extension .vcf.

Q: I get this error:

ValueError in file /home/boas/mambaforge/lib/python3.10/site-packages/skiml_cluster/workflow/Snakefile, line 11:
Input directory does not exist.

A: The input directory cannot be found. Please make sure there are no typos in the skiml-cluster command.

Methodology

skiml-cluster does a couple of things, which is managed by Snakemake.

  1. SNP VCF files are compressed and indexed (required for other analyses).
  2. All indexed VCF files are merged into a multi-sample VCF.
  3. A UPGMA tree is generated from the multi-sample VCF using vcfkit.
  4. A Fasta pseudo-alignment is generated from the multi-sample VCF using vcfkit.
  5. SNP distances are calculated from the pseudo-alignment using snp-dists.

A command line interface of the Snakemake pipeline has been generated using snk.

License

skiml-cluster is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skiml_cluster-0.0.6.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skiml_cluster-0.0.6-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file skiml_cluster-0.0.6.tar.gz.

File metadata

  • Download URL: skiml_cluster-0.0.6.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.25.0

File hashes

Hashes for skiml_cluster-0.0.6.tar.gz
Algorithm Hash digest
SHA256 e91dc81572aca300ae0eae548491a1187b93646023faec29e2338daed224a46a
MD5 6836ec3d75fe0fde363fb4629b8f5c58
BLAKE2b-256 959e3e120c0c208c967bf3e15eee596ac03561c57c13ab3d6a5ba5c48874b395

See more details on using hashes here.

File details

Details for the file skiml_cluster-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for skiml_cluster-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 b081ca50835fe263c54b1cf931d9a88f3b428c0f53de3abcd5f58027d93fd91b
MD5 7130ce52af327103e86a369c11ba7689
BLAKE2b-256 02903a7d8b1c48bb44352fae5b4e24a119ade9006e6ad72fd69ab4932731ee13

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page