Assign clades to SARS-CoV-2 genome sequences at a point in time.

These details have not been verified by PyPI

Project links

Documentation

Project description

cladetime CI status

User Guide

Cladetime is a wrapper around Nextstrain's GenBank-based SARS-CoV-2 genome sequence data and the metadata that describes it. Included with the metadata are the clades (variants) that each sequence is assigned to.

An advanced feature of Cladetime is the ability to perform custom clade assignments using past reference trees. For example, you can use the current set of sequence data and assign clades to it using the reference tree as it existed three months ago.

Cladetime is designed for use with US-based sequences from Homo sapiens.

Installation

Cladetime is written in Python and can be installed using pip:

pip install cladetime

The CladeTime class

Most of Cladetime's features are accessible through the CladeTime class, which accepts two optional parameters:

sequence_as_of: access Nextstrain SARS-CoV-2 sequence data and metadata files as they existing on this date (defaults to the current UTC datetime)
tree_as_of: the date of the reference tree to use for clade assignments (defaults to sequence_as_of)

[!IMPORTANT] Using tree_as_of for custom clade assignments is an advanced feature and requires Docker.

>>> from cladetime import CladeTime

# Create a CladeTime object that references the most recent available sequence
# data and metadata from Nextstrain
>>> ct = CladeTime()

Accessing sequence data

Each CladeTime object has a link to the full set of Nextstrain's SARS-Cov-2 genomic sequences as they existed on the sequence_as_of date. This data is in .fasta format, and most users won't need to download it directly.

>>> from cladetime import CladeTime
>>> ct = CladeTime()
>>> ct.url_sequence
https://nextstrain-data.s3.amazonaws.com/files/ncov/open/sequences.fasta.xz?versionId=4Sv2PbA1NoEd.V_LOOQSBPkqBpdoj7s_'

More interesting to most users will be the metadata that describes each sequence.

The sequence_metadata attribute of a CladeTime object is a Polars LazyFrame that points to a copy of Nextstrain's sequence metadata.

You can apply your own filters and transformations to the LazyFrame, but it's a good idea to start with the built-in filter_metadata function that removes non-US and non-human sequences from the metadata.

A collect() operation will return the filtered metadata as an in-memory Polars DataFrame.

>>> import polars as pl
>>> from cladetime import CladeTime, sequence

>>> ct = CladeTime()
>>> filtered_metadata = sequence.filter_metadata(ct.sequence_metadata)

# Alternately, specify a sequence collection date range to the filter
>>> filtered_metadata = sequence.filter_metadata(
>>>     ct.sequence_metadata,
>>>     collection_min_date = "2024-10-01",collection_max_date ="2024-10-31"
>>> )

>>> metadata_df = filtered_metadata.collect(streaming=True)

# Pandas users can export Polars dataframes
>>> pandas_df = filtered_sequence_metadata.to_pandas()

Past sequence data

Working with past sequence data and metadata is similar to the above examples. Just pass in a sequence_as_of date when creating a CladeTime object.

The clades returned as part of the metadata will reflect the reference tree in use when sequence metadata file was created.

>>> from cladetime import CladeTime

# Create a CladeTime object for any date after May, 2023
>>> ct = CladeTime(sequence_as_of="2024-10-15")

Custom clade assignments

You may want to assign sequence clades using a reference tree from a past date. This feature is helpful when creating "source of truth" data to evaluate models that predict clade proportions:

create a CladeTime object using the tree_as_of parameter
filter the sequence metadata to include only the sequences you want to assign
pass the filtered metadata to the assign_clades method

CladeTime's assign_clades method returns two Polars LazyFrames:

detail: a linefile of each sequence and its assigned clade
summary: clade counts summarized by country, location, date and host

[!WARNING] In addition to requiring Docker, assign_clades is resource-intensive, because the process requires downloading a full copy of SARS-CoV-2 sequence data and then filtering it.

The filtered sequences are then run through Nextclade's CLI for clade assignment, another resource-intensive process. We recommend not assigning more than 30 days worth of sequence collections at a time.

>>> import polars as pl
>>> from cladetime import CladeTime, sequence

>>> ct = CladeTime(sequence_as_of="2024-11-15", tree_as_of="2024-09-01")
>>> filtered_metadata = sequence.filter_metadata(
>>>     ct.sequence_metadata,
>>>     collection_min_date = "2024-10-01",
>>>     collection_max_date ="2024-10-31"
>>> )
>>> clade_assignments = ct.assign_clades(filtered_metadata)

# Summarized clade assignments
>>> clade_assignments.summary.collect().head()
shape: (5, 6)
┌──────────┬────────────┬──────────────┬──────────────────┬─────────┬───────┐
│ location ┆ date       ┆ host         ┆ clade_nextstrain ┆ country ┆ count │
│ ---      ┆ ---        ┆ ---          ┆ ---              ┆ ---     ┆ ---   │
│ str      ┆ date       ┆ str          ┆ str              ┆ str     ┆ u32   │
╞══════════╪════════════╪══════════════╪══════════════════╪═════════╪═══════╡
│ IL       ┆ 2024-10-28 ┆ Homo sapiens ┆ 24C              ┆ USA     ┆ 1     │
│ IL       ┆ 2024-10-11 ┆ Homo sapiens ┆ 24C              ┆ USA     ┆ 5     │
│ NY       ┆ 2024-10-08 ┆ Homo sapiens ┆ 24B              ┆ USA     ┆ 2     │
│ AZ       ┆ 2024-10-15 ┆ Homo sapiens ┆ 24C              ┆ USA     ┆ 1     │
│ MN       ┆ 2024-10-06 ┆ Homo sapiens ┆ 24A              ┆ USA     ┆ 2     │
└──────────┴────────────┴──────────────┴──────────────────┴─────────┴───────┘

# Detailed clade assignments
>>> clade_assignments.detail.collect().select(
>>>     ["country", "location", "date", "strain", "clade_nextstrain"]
>>>    ).head()
shape: (5, 5)
┌─────────┬──────────┬────────────┬─────────────────────┬──────────────────┐
│ country ┆ location ┆ date       ┆ strain              ┆ clade_nextstrain │
│ ---     ┆ ---      ┆ ---        ┆ ---                 ┆ ---              │
│ str     ┆ str      ┆ date       ┆ str                 ┆ str              │
╞═════════╪══════════╪════════════╪═════════════════════╪══════════════════╡
│ USA     ┆ AZ       ┆ 2024-10-01 ┆ USA/2024CV1711/2024 ┆ 24C              │
│ USA     ┆ AZ       ┆ 2024-10-02 ┆ USA/2024CV1718/2024 ┆ 24C              │
│ USA     ┆ AZ       ┆ 2024-10-04 ┆ USA/2024CV1719/2024 ┆ 24C              │
│ USA     ┆ AZ       ┆ 2024-10-05 ┆ USA/2024CV1721/2024 ┆ 24C              │
│ USA     ┆ AZ       ┆ 2024-10-06 ┆ USA/2024CV1722/2024 ┆ recombinant      │
└─────────┴──────────┴────────────┴─────────────────────┴──────────────────┘

Reproducibility

CladeTime objects have an ncov_metadata property with information needed to reproduce the clade assignments in the object's sequence metadata.

In the example below, ncov_metadata shows that the Nextclade dataset used for clade assignment on 2024-09-22 was 2024-07-17--12-57-03Z.

Each version of a SARS-CoV-2 Nextclade dataset contains a reference tree that can be used as an input for clade assignments.

>>> from cladetime import CladeTime
>>> ct = CladeTime(sequence_as_of='2024-09-22')

>>> ct.ncov_metadata.get('nextclade_dataset_name')
'SARS-CoV-2'
>>> ct.ncov_metadata.get('nextclade_dataset_version')
'2024-07-17--12-57-03Z'

Access to historical copies of ncov_metadata is what allows Cladetime to access past reference trees for custom clade assignments. Cladetime retrieves a separate set of ncov_metadata for the tree_as_of date and uses it to pass the correct reference tree to the assign_clades method.

Project details

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

0.4.0

Dec 10, 2025

0.3.0

Jan 29, 2025

This version

0.2.4

Jan 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cladetime-0.2.4.tar.gz (61.2 MB view details)

Uploaded Jan 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cladetime-0.2.4-py3-none-any.whl (25.0 kB view details)

Uploaded Jan 14, 2025 Python 3

File details

Details for the file cladetime-0.2.4.tar.gz.

File metadata

Download URL: cladetime-0.2.4.tar.gz
Upload date: Jan 14, 2025
Size: 61.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for cladetime-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`91115333c48395e7384afe89f8c0c712da39e4945b15c13c5f9fe40593de2499`
MD5	`1a2f386f16330a1728735b45784b442a`
BLAKE2b-256	`88b0dc434f3b7ba34f10159cb380884c4332acddb039c3ae93f8b4a78d8ea21c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cladetime-0.2.4.tar.gz:

Publisher: publish-pypi.yaml on reichlab/cladetime

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cladetime-0.2.4.tar.gz
- Subject digest: 91115333c48395e7384afe89f8c0c712da39e4945b15c13c5f9fe40593de2499
- Sigstore transparency entry: 162319419
- Sigstore integration time: Jan 14, 2025
Source repository:
- Permalink: reichlab/cladetime@8edda6b620842bb4122e84f2bd7c9e4309c3b2f3
- Branch / Tag: refs/tags/v0.2.4
- Owner: https://github.com/reichlab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yaml@8edda6b620842bb4122e84f2bd7c9e4309c3b2f3
- Trigger Event: push

File details

Details for the file cladetime-0.2.4-py3-none-any.whl.

File metadata

Download URL: cladetime-0.2.4-py3-none-any.whl
Upload date: Jan 14, 2025
Size: 25.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for cladetime-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`89a035e51ebcfe08cdeb490e98395be700ce5639fed7707cbe45190e71b30782`
MD5	`3497ba8f5aa33b53d719d5a7e914c957`
BLAKE2b-256	`d7517e267e5288b69caef89486bc98e93649f8f544cc73e16dd4445aa618eba9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for cladetime-0.2.4-py3-none-any.whl:

Publisher: publish-pypi.yaml on reichlab/cladetime

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: cladetime-0.2.4-py3-none-any.whl
- Subject digest: 89a035e51ebcfe08cdeb490e98395be700ce5639fed7707cbe45190e71b30782
- Sigstore transparency entry: 162319420
- Sigstore integration time: Jan 14, 2025
Source repository:
- Permalink: reichlab/cladetime@8edda6b620842bb4122e84f2bd7c9e4309c3b2f3
- Branch / Tag: refs/tags/v0.2.4
- Owner: https://github.com/reichlab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yaml@8edda6b620842bb4122e84f2bd7c9e4309c3b2f3
- Trigger Event: push

cladetime 0.2.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

User Guide

Installation

The CladeTime class

Accessing sequence data

Past sequence data

Custom clade assignments

Reproducibility

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance