Skip to main content

A tool for parsing and extracting taxon synonyms.

Project description

PhyloMatcher

Python modules to query the GFIB or NCBI Taxonomy databases for synonym names of target species.

Installation

Required packages:

  • Biopython
  • pygbif
  • pandas
  • tqdm

Easiest installation using a conda environment and pip:

conda create -n tm-env -c conda-forge python
conda activate tm-env
pip install PhyloMatcher

Modes

GBIF

CSV input file should be a single column of target species to look up (other columns will be ignored). Names can be either space or underscore ("_") separated, i.e. "Sphenodon_punctatus" is equivalent to "Sphenodon punctatus".

E.g.

Sphenodon_punctatus
Gonyosoma_prasinus

Usage:

$ PhyloMatcher gbif -h
usage: PhyloMatcher gbif [-h] -i INPUT_CSV -o OUTFILE [-t THREADS]

options:
  -h, --help            show this help message and exit
  -i INPUT_CSV, --csv INPUT_CSV
                        CSV where first column is a list of target species names to look up.
  -o OUTFILE, --outfile OUTFILE
                        Path to output.
  -t THREADS, --threads THREADS

NCBI

Due to the specificity of Entrez results and the relative sparseness of taxonomy data a CSV intended for NCBI matching can have multiple columns, assuming the first column is the target species and the remaining columns are prior-known synonyms. All names will be flattened and searched to increase the chances of matches in the database. A single-column CSV file will also work, identically to the GBIF format.

Note: currently the multi-Entrez script assumes the final column in the CSV is for notes. If people decide this is worth fixing I will, but it seems like the GBIF approach is much better across the board.

E.g.

Sphenodon_punctatus,Hatteria_punctata
Gonyosoma_prasinus,Coluber_prasinus,Elaphe_prasina,Rhadinophis_prasinus,Rhadinophis_prasina

Usage:

$ PhyloMatcher ncbi -h
usage: PhyloMatcher ncbi [-h] -i INPUT_CSV -o OUTFILE -e EMAIL

options:
  -h, --help            show this help message and exit
  -i INPUT_CSV, --csv INPUT_CSV
                        CSV where first column is a list of target species names to look up.
  -o OUTFILE, --outfile OUTFILE
                        Path to output.
  -e EMAIL, --email EMAIL

Matching trait files to PhyloMatcher output

Once you have found synonyms from tree output you can match that to any phenotypic/trait data you have. Similar to the other inputs the first column must contain the species names you're targeting, and they should be formatted identically to how the PhyloMatcher output looks (case-sensitive and separated by underscores.)

Usage:

$ PhyloMatcher trait -h
usage: PhyloMatcher trait [-h] -t TRAITFILE -s SPECIESFILE -o OUTFILE

options:
  -h, --help            show this help message and exit
  -t TRAITFILE, --traitfile TRAITFILE
                        CSV of trait values, first column must be species names.
  -s SPECIESFILE, --speciesfile SPECIESFILE
                        CSV of species synonyms output by the gbif or ncbi modules.
  -o OUTFILE, --outfile OUTFILE
                        Path to output.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phylomatcher-0.3.2.tar.gz (11.8 kB view details)

Uploaded Source

File details

Details for the file phylomatcher-0.3.2.tar.gz.

File metadata

  • Download URL: phylomatcher-0.3.2.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for phylomatcher-0.3.2.tar.gz
Algorithm Hash digest
SHA256 e3a227504b6ab29eca09796679b237b38fae7bdb7c97d56e9716d1824b52c3fa
MD5 4a06902f4b5d915a25c79240f47f0fc0
BLAKE2b-256 d5574d7e45ded83cab8ebae66131aa767e21ba3a605de28c8794dfa4404650d8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page