Skip to main content

Infer a succinct tree sequence from SARS-CoV-2 sequence alignments

Project description

sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data

This is an early alpha version not intended for production use!!

If you are interested in helping to develop sc2ts or would like to work with the inferred ARGS, please get in touch.

Installation

To run the downstream analysis utilties, install from pip using

python3 -m pip install sc2ts[analysis]

This installs matplotlib and some other heavyweight dependencies.

For just running the inference tools, use

python3 -m pip install sc2ts

Inference workflow

Command line inference

Inference is intended to be run from the command-line primarily, and most likely orchestrated via a shell script or Snakemake file, etc.

The CLI is split into subcommands. Get help by running the CLI without arguments:

python3 -m sc2ts

Import metadata to local database

Metadata for all samples must be available, and provided in a tab-separated file. We need to convert from a standard text file to a SQLite database so that we can quickly search for strains collected on a given day, without loading the entire set each time.

python3 -m sc2ts import-metadata data/metadata.tsv data/metadata.db

TODO: Document required fields

Import alignments

To provide fast access to the individual alignments, we store them in a local database file. These must be imported before inference can be performed.

The basic approach is to use the import-alignments command, with a path to a alignments.db file which we are creating, and one or more FASTA files that we are importing into it.

python3 -m sc2ts import-alignments data/alignments.db data/alignments/.fasta

By default the database file is updated each time, so this can be done in stages.

TODO discuss the storage and time requirements for this step!

Run the inference

The basic approach is to run the daily-extend command which runs the basic extension operation day-by-day using the information in the metadata DB.

python3 -m sc2ts daily-extend data/alignments.db data/metadata.db results/output-prefix

Example run script

Here is a script used to run the inference for the Long ARG in the preprint:

#!/bin/bash
set -e
precision=12
mismatches=3
max_submission_delay=30
max_daily_samples=1000
num_threads=40
datadir=data
run_id=upgma-mds-$max_daily_samples-md-$max_submission_delay-mm-$mismatches
resultsdir=results/$run_id
results_prefix=$resultsdir/$run_id-
logfile=logs/$run_id.log
# Setup the options
options="--num-threads $num_threads -vv -l $logfile "
options+="--max-submission-delay $max_submission_delay "
options+="--max-daily-samples $max_daily_samples "
options+="--precision $precision --num-mismatches $mismatches"
# Create the results dir and data paths
mkdir -p $resultsdir
alignments=$datadir/alignments2.db
metadata=$datadir/metadata.filtered.db
# NOTE: we can start from a given data also with the -b option
# basets="$results_prefix"2022-01-24.ts
# options+=" -b $basets"
python3 -m sc2ts daily-extend $alignments $metadata $results_prefix $options

Licensing

The code is marked as licensed under the MIT license, but because the current implementation is used the matching engine from tsinfer (which is GPL licensed) this code is therefore also GPL.

However, we plan to switch out the matching engine for an implementation provided by tskit, which is MIT licensed. This will be done before the first official release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sc2ts-0.0.3.tar.gz (85.1 kB view details)

Uploaded Source

Built Distribution

sc2ts-0.0.3-py3-none-any.whl (67.4 kB view details)

Uploaded Python 3

File details

Details for the file sc2ts-0.0.3.tar.gz.

File metadata

  • Download URL: sc2ts-0.0.3.tar.gz
  • Upload date:
  • Size: 85.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for sc2ts-0.0.3.tar.gz
Algorithm Hash digest
SHA256 2b25d00d8dcd2428b2c8b627d3db7812a1b8518ce4ab078e2a28820822361374
MD5 680185e590d0ac76650111367b313dec
BLAKE2b-256 a02b2b5e9ba1ce4fc850793106a1f68623c66a365f6fa3ef1241bb8745fb2fae

See more details on using hashes here.

File details

Details for the file sc2ts-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: sc2ts-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 67.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for sc2ts-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c19096bd14260ba6b619896265783d7115155f83c2e3a82d8b88fc8d10e4e9c0
MD5 26270ba7e023117118d4eead95018184
BLAKE2b-256 aad2b9148989531131f1f770694e14b085a3ff65c7cef1654ad07102eb0771bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page