Taxonomy Informed Clustering (TIC) is a tool for clustering bacterial sequences based on their taxonomy and hypothetically complete taxonomy levels.
Project description
TIC (Taxonomy Informed Clustering) Pipeline
Overview
The Taxonomy Informed Clustering (TIC) pipeline is a novel approach for processing 16S rRNA amplicon datasets in diversity analyses. TIC leverages classifier-assigned taxonomy to refine clustering, ensuring that only sequences sharing the same taxonomic path are grouped together [1]. This method is particularly useful for analyzing bacterial diversity using 16S rRNA gene amplicon sequencing data.
Inspired by the initial TIC pipeline as described in Taxonomy Informed Clustering, an Optimized Method for Purer and More Informative Clusters in Diversity Analysis and Microbiome Profiling, this project aims to provide a simple and easy-to-use version of the original TIC.
Key Features
- Taxonomy-Driven Clustering: TIC taxonomically classifies each sequence before clustering, using the taxonomic information to guide and constrain the clustering process. This approach divides the dataset into subsets based on assigned taxonomies, preventing the merging of sequences from diverse lineages.
- Modular Design: The pipeline is designed with a modular structure, making it easy to modify and extend.
- Comprehensive Pipeline: TIC offers a complete, automated pipeline for diversity analyses, from raw reads to compositional tables.
Pipeline Steps
The TIC pipeline involves several key steps:
- Filling upto Order Level: All bacterial sequences with taxonomy not known upto order level are filled with
NA__<level>__<parent> - Complete Family Levels: Sequences of orders with unknown classification at family level get clustered at 0.90 sequences similarity using uclust and assigned their family level taxonomy (e.g: FOTU11)
- Complete Genus Levels: Sequences of families with unknown classification at genus level get clustered at 0.95 sequence similarity using uclust and assigned their genus level taxonomy (e.g: GOTU11)
- Complete Species Levels: Sequences of genera with unknown classification at species level get clustered at 0.987 similarity using uclust and assigned their family level taxonomy (e.g: SOTU11)
Installation
To install the TIC pipeline, follow these steps:
PIP
pip install ticlust
Run TIC
Currently simple version of TIC is available in this repo
After tic is installed you can run TIC by commands below:
ticlust -f <path/to/taxed_fasta_file.fasta>
or
python3 -m ticlust -f <path/to/taxed_fasta_file.fasta>
Sequences in the input FASTA file should have their taxonomies as last part of their sequence header starting with tax=. For example:
>Zotu1 some other header parts tax=Bacteria;Proteobacteria;Gammaproteobacteria;Enterobacterales;Enterobacteriaceae;Escherichia-Shigella.
Note that taxonomies levels are separated by semicolon (;).
Run TIC with accompanying zOTU table
By passing some arguments to TIC you can define your own threshold for family, genus, and species-level clustering.
ticlust -f <path/to/taxed_fasta_file.fasta> -z <path/to/zotu_table.tsv>
Run TIC with various clustering thresholds
In case you have a zOTU table for sequences in your zOTUs FASTA file. You can pass this to TIC to have a new table for SOTUs (species-level OTUs) generated by TIC. As a result you will have a new OTU table file named SOTUs-Table.tab and a new FASTA file named SOTUs-Seq.fasta which contains extracted SOTUs from original zOTU table and sequences of SOTUs generated by TIC. These sequences are
sequences of centroid zOTUs for each SOTU cluster. Centroid of each cluster is the zOTU in that cluster with highest count in the zOTU table.
ticlust -f <path/to/taxed_fasta_file.fasta> -z <path/to/zotu_table.tsv> -st 0.99 -gt 0.97 -ft 0.95
or
python3 -m ticlust -f <path/to/taxed_fasta_file.fasta> -z <path/to/zotu_table.tsv> -st 0.99 -gt 0.97 -ft 0.95
Outputs
TIC will create a directory (i.e: TIC-WD) in the same directory of the input FASTA file. Inside the directory resides files listed below:
TIC-FullTaxonomy.fasta: bacterial sequences with full-level taxonomy filled by TICMap-FOTU-GOTU.tab: two columns 'FOTU' and 'GOTU' which maps genus-level clusters (i.e: GOTUs or gOTUs) to their parent family-level clusters (i.e: FOTUs or fOTUs)Map-GOTU-SOTU.tab: two columns 'GOTU' and 'SOTU' which maps species-level clusters (i.e: SOTUs or sOTUs) to their parent genus-level clusters (i.e: GOTUs or gOTUs)Map-SOTU-ZOTU.tab: two columns 'SOTU' and 'ZOTU' which maps zero-radius operational taxonomic units (i.e: ZOTUs or zOTUs) to their parent species-level clusters (i.e: SOTUs or sOTUs)Non-Bacteria-Sequences.fasta: all non-bacterial sequences in input FASTA file.
Advantages
TIC enhances the clustering process by utilizing taxonomic information acquired beforehand, leading to higher cluster quality and purity. This approach also enables the proper placement of unassigned sequences within the taxonomy.
Availability
TIC as a python package (i.e: current repository) is available on SimpleTIC. The initial TIC pipeline as described in Taxonomy Informed Clustering, an Optimized Method for Purer and More Informative Clusters in Diversity Analysis and Microbiome Profiling is freely available on GitHub [1].
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ticlust-1.0.3.tar.gz.
File metadata
- Download URL: ticlust-1.0.3.tar.gz
- Upload date:
- Size: 747.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5772411189fbf8474489f928436244cb58616c7121f54b76e72141bc400de630
|
|
| MD5 |
1f4d8408d9eb42446b92ec6e131a5d1c
|
|
| BLAKE2b-256 |
78f898aab6b8131b8963bef862d68242b410e71e0aad6804a11fb59647d01e86
|
Provenance
The following attestation bundles were made for ticlust-1.0.3.tar.gz:
Publisher:
version_release_workflow.yml on MPourjam/TIClust
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ticlust-1.0.3.tar.gz -
Subject digest:
5772411189fbf8474489f928436244cb58616c7121f54b76e72141bc400de630 - Sigstore transparency entry: 176899141
- Sigstore integration time:
-
Permalink:
MPourjam/TIClust@88e3aa70cc8706d46deb76ddec42048402da4bf4 -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/MPourjam
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
version_release_workflow.yml@88e3aa70cc8706d46deb76ddec42048402da4bf4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ticlust-1.0.3-py3-none-any.whl.
File metadata
- Download URL: ticlust-1.0.3-py3-none-any.whl
- Upload date:
- Size: 258.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d1e67b5c6325401f1e4e7b4a4c325cbf94ac9c0b0efea14c842798c8d19d55a
|
|
| MD5 |
45e599d789c3ab8afef044237382f782
|
|
| BLAKE2b-256 |
a9a51c0f09504ffb92d3364d448d8ea125220b71109ec6341b5358405cdddf32
|
Provenance
The following attestation bundles were made for ticlust-1.0.3-py3-none-any.whl:
Publisher:
version_release_workflow.yml on MPourjam/TIClust
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ticlust-1.0.3-py3-none-any.whl -
Subject digest:
8d1e67b5c6325401f1e4e7b4a4c325cbf94ac9c0b0efea14c842798c8d19d55a - Sigstore transparency entry: 176899148
- Sigstore integration time:
-
Permalink:
MPourjam/TIClust@88e3aa70cc8706d46deb76ddec42048402da4bf4 -
Branch / Tag:
refs/tags/v1.0.3 - Owner: https://github.com/MPourjam
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
version_release_workflow.yml@88e3aa70cc8706d46deb76ddec42048402da4bf4 -
Trigger Event:
push
-
Statement type: