Skip to main content

A set of Python scripts to proceed to taxonomical resolution and retrieval of upper taxonomies.

Project description

taxonomical-utils

Release Build status codecov Commit activity License

A set of Python scripts to proceed to taxonomical resolution and retrieval of upper taxonomies.

Description

This repository contains a set of Python scripts to proceed to taxonomical resolution and retrieval of upper taxonomies. For now it uses the Open Tree of Life as a source of taxonomical data. The taxonomical-utils are merely wrappers around the python opentree package. It includes functions for resolving taxonomic names, appending upper taxonomic lineage information, and merging data files.

Installation

To install the Taxonomical Utils, follow these steps:

Clone the repository:

git clone https://github.com/digital-botanical-gardens-initiative/taxonomical-utils.git

Navigate to the project directory:

cd taxonomical-utils

Install the required dependencies using Poetry:

poetry install

Usage

CLI Commands

Taxonomical Utils provides several command-line interface (CLI) commands to process taxonomic data. Each command can be run individually or as part of a pipeline.

1. Resolve Taxa

This command resolves taxonomic names from an input file and generates a resolved taxa file.

Command:

poetry run taxonomical-utils resolve --input-file <input_file> --output-file <resolved_taxa_file> --org-column-header <org_column_header>
  • <input_file>: Path to the input CSV/TSV file containing taxonomic names.
  • <resolved_taxa_file>: Path to the output file where resolved taxa will be saved.
  • <org_column_header>: Column header in the input file that contains the taxonomic names.

Example:

poetry run taxonomical-utils resolve --input-file ./data/in/example.csv --output-file ./data/out/resolved_taxa.csv --org-column-header idTaxon

2. Append Upper Taxa Lineage

This command appends upper taxonomic lineage information to the resolved taxa file.

Command:

poetry run taxonomical-utils append-taxonomy --input-file <resolved_taxa_file> --output-file <upper_taxa_lineage_file>
  • <resolved_taxa_file>: Path to the resolved taxa file generated by the resolve command.
  • <upper_taxa_lineage_file>: Path to the output file where the upper taxa lineage information will be saved.

Example:

poetry run taxonomical-utils append-taxonomy --input-file data/out/resolved_taxa.csv --output-file data/out/upper_taxa_lineage.csv

3. Merge Data Files

This command merges the original input file with the resolved taxa file and upper taxa lineage file to produce a fully resolved dataset.

Command:

poetry run taxonomical-utils merge --input-file <input_file> --resolved-taxa-file <resolved_taxa_file> --upper-taxa-lineage-file <upper_taxa_lineage_file> --output-file <final_output_file> --org-column-header <org_column_header>
  • <input_file>: Path to the original input CSV/TSV file.
  • <resolved_taxa_file>: Path to the resolved taxa file generated by the resolve command.
  • <upper_taxa_lineage_file>: Path to the upper taxa lineage file generated by the append-taxonomy command.
  • <final_output_file>: Path to the final output file where the merged data will be saved.
  • <org_column_header>: Column header in the input file that contains the taxonomic names.

Example:

poetry run taxonomical-utils merge --input-file data/example.csv --resolved-taxa-file data/out/resolved_taxa.csv --upper-taxa-lineage-file data/out/upper_taxa_lineage.csv --output-file data/out/final_output.csv --org-column-header idTaxon

Running the Full Pipeline

To run the entire pipeline, you can execute the commands sequentially:

Resolve Taxa:

poetry run taxonomical-utils resolve --input-file data/example.csv --output-file data/out/resolved_taxa.csv --org-column-header idTaxon

Append Upper Taxa Lineage:

poetry run taxonomical-utils append-taxonomy --input-file data/out/resolved_taxa.csv --output-file data/out/upper_taxa_lineage.csv

Merge Data Files:

poetry run taxonomical-utils merge --input-file data/example.csv --resolved-taxa-file data/out/resolved_taxa.csv --upper-taxa-lineage-file data/out/upper_taxa_lineage.csv --output-file data/out/final_output.csv --org-column-header idTaxon

Running the Commands as a Pipeline

You can also run the commands in a pipeline using && to ensure each command runs only if the previous command succeeds:

poetry run taxonomical-utils resolve --input-file data/example.csv --output-file data/out/resolved_taxa.csv --org-column-header idTaxon && \
poetry run taxonomical-utils append-taxonomy --input-file data/out/resolved_taxa.csv --output-file data/out/upper_taxa_lineage.csv && \
poetry run taxonomical-utils merge --input-file data/example.csv --resolved-taxa-file data/out/resolved_taxa.csv --upper-taxa-lineage-file data/out/upper_taxa_lineage.csv --output-file data/out/final_output.csv --org-column-header idTaxon

Testing

To run the tests, use the following command:

make test

This will execute the test suite and ensure that all functions are working correctly.

Contributing

Contributions are welcome! Please submit a pull request or open an issue to discuss any changes.


Repository initiated with fpgmaas/cookiecutter-poetry.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxonomical_utils-0.10.3.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

taxonomical_utils-0.10.3-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file taxonomical_utils-0.10.3.tar.gz.

File metadata

  • Download URL: taxonomical_utils-0.10.3.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1022-azure

File hashes

Hashes for taxonomical_utils-0.10.3.tar.gz
Algorithm Hash digest
SHA256 cdd5179c2313496f4b094c0f14b4f7296a478130e8bb9868b1c49a4ae1ef8666
MD5 7929693409df95d099542cbbe24df983
BLAKE2b-256 f32b14284bc808243a6cdd977a17d59a4c328c04d47d95d6d949434305f23952

See more details on using hashes here.

File details

Details for the file taxonomical_utils-0.10.3-py3-none-any.whl.

File metadata

File hashes

Hashes for taxonomical_utils-0.10.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e1c0202ef2c0c1617766507a0838aa0db49972469aa0f6c57f62d898d0b499c9
MD5 606fa512937169b4cb265fbd00e2f176
BLAKE2b-256 f0a1f18d1a276c3780caec5c1d74f804d2e97f9512b34f593ef9d985f1b6da23

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page