Skip to main content

TAXonomic Profile Aggregation and STAndardisation

Project description

taxpasta logo - a green DNA double helix morphing into a fusili pasta shape with the word taxpasta above it

TAXonomic Profile Aggregation and STAndardisation

Package Latest PyPI Version Supported Python Versions DOI
Meta Project Status: Active – The project has reached a stable, usable state and is being actively developed. Apache-2.0 Code of Conduct Code Style Black pyOpenSci DOI
Automation GitHub Workflow Documentation Code Coverage

About

The main purpose of taxpasta is to standardise taxonomic profiles created by a range of bioinformatics tools. We call those tools taxonomic profilers. They each come with their own particular tabular output format. Across the profilers, relative abundances can be reported in read counts, fractions, or percentages, as well as any number of additional columns with extra information. We therefore decided to take the lessons learnt to heart and provide our own solution to deal with this pasticcio. With taxpasta you can ingest all of those formats and, at a minimum, output taxonomy identifiers and their integer counts. Taxpasta can not only standardise profiles but also merge them across samples for the same profiler into a single table.

Diagram of taxpasta functionality. On the left are a range of taxonomic profilers with hetereogeneous output types with a header of taxonomic profiles, then a range of colourful lines leading into a box with a single green line, the taxpasta logo plus three icons for Validation, Standardisation and Conversion, and finally a range of green lines spreading out to a range of file icons with various file types with a header of Standardised Tables.

Supported Taxonomic Profilers

Taxpasta currently supports standardisation and generation of comparable taxonomic tables for:

See supported profilers for more information.

Install

It's as simple as:

pip install taxpasta

Taxpasta is also available from the Bioconda channel

conda install -c bioconda taxpasta

and thus automatically generated Docker and Singularity BioContainers images also exist.

Optional Dependencies

Taxpasta supports a number of extras that you can install for additional features; primarily support for additional output file formats. You can install them by specifying a comma separated list within square brackets, for example,

pip install 'taxpasta[rich,biom]'
  • rich provides rich-formatted command line output and logging.
  • arrow supports writing output tables in Apache Arrow format.
  • parquet supports writing output tables in Apache Parquet format.
  • biom supports writing output tables in BIOM format.
  • ods supports writing output tables in ODS format.
  • xlsx supports writing output tables in Microsoft Excel format.
  • all includes all of the above.
  • dev provides all tools needed for contributing to taxpasta.

Usage

The main entry point for taxpasta is its command-line interface (CLI). You can interactively explore the offered commands through the help system.

taxpasta -h

Taxpasta currently offers two commands corresponding to the main use-cases. You can find out more in the commands' documentation.

Standardise

Since the supported profilers all produce their own flavour of tabular output, a quick way to normalize such files, is to standardise them with taxpasta. You need to let taxpasta know what tool the file was created by. As an example, let's standardise a MetaPhlAn profile. (You can find an example file in our test data.)

curl -O https://raw.githubusercontent.com/taxprofiler/taxpasta/main/tests/data/metaphlan/MOCK_002_Illumina_Hiseq_3000_se_metaphlan3-db.metaphlan3_profile.txt
taxpasta standardise -p metaphlan -o standardised.tsv MOCK_002_Illumina_Hiseq_3000_se_metaphlan3-db.metaphlan3_profile.txt

With these minimal arguments, taxpasta produces a two column output consisting of

taxonomy_id count

You can count on the second column being integers :wink:. Having such a simple and tidy table should make your downstream analysis much smoother to start out with. Please have a look at the full getting started tutorial for a more thorough introduction.

Merge

Converting single tables is nice, but hopefully you have many shiny samples to analyze. The taxpasta merge command works similarly to standardise except that you provide multiple profiles as input. You can grab a few more 'MOCK' examples from our test data and try it out.

LOCATION=https://raw.githubusercontent.com/taxprofiler/taxpasta/main/tests/data/metaphlan
curl -O "${LOCATION}/MOCK_001_Illumina_Hiseq_3000_se_metaphlan3-db.metaphlan3_profile.txt"
curl -O "${LOCATION}/MOCK_002_Illumina_Hiseq_3000_se_metaphlan3-db.metaphlan3_profile.txt"
curl -O "${LOCATION}/MOCK_003_Illumina_Hiseq_3000_se_metaphlan3-db.metaphlan3_profile.txt"

taxpasta merge -p metaphlan -o merged.tsv MOCK_*.metaphlan3_profile.txt

The output of the merge command has one column for the taxonomic identifier and one more column for each input profile. Again, have a look at the full getting started tutorial for a more thorough introduction.

Citation

If you use TAXPASTA in your academic work, please cite our article in the Journal of Open Source Software.

Beber, M. E., Borry, M., Stamouli, S., & Fellows Yates, J. A. (2023). TAXPASTA: TAXonomic Profile Aggregation and STAndardisation. Journal of Open Source Software, 8(87), 5627. https://doi.org/10.21105/joss.05627

Acknowledgments

Many thanks to:

Copyright

  • Copyright © 2022-2024, Moritz E. Beber, Maxime Borry, James A. Fellows Yates, and Sofia Stamouli.
  • Free software distributed under the Apache Software License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxpasta-0.7.0.tar.gz (50.1 kB view details)

Uploaded Source

Built Distribution

taxpasta-0.7.0-py3-none-any.whl (136.4 kB view details)

Uploaded Python 3

File details

Details for the file taxpasta-0.7.0.tar.gz.

File metadata

  • Download URL: taxpasta-0.7.0.tar.gz
  • Upload date:
  • Size: 50.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for taxpasta-0.7.0.tar.gz
Algorithm Hash digest
SHA256 320d8499124e03a20baec4e46dc3ade855c9aec9113ad2d632d0301d347d0be4
MD5 08606442dec22f0b70bb0c1ca2d804a8
BLAKE2b-256 51145ab052134d8d026c3d341e2cb6c0f4041cc626965131206aee25bdefc42e

See more details on using hashes here.

File details

Details for the file taxpasta-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: taxpasta-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 136.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for taxpasta-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f14547d965fa73ff82c2b733eee5b1a742961e1d86a21ab376293049ce96eb01
MD5 1161bec9410372ad631fd52741239054
BLAKE2b-256 36e4557b176b0acfd521ad2893a6f49e0e2f8d9af8622542c26e38b8bd9abe82

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page