Skip to main content

Generate bacterial nomenclature Latin names using a combinatorial approach, as decribed in Pallen et al. (2021) 'The Next Million Names for Archaea and Bacteria'

Project description

seqfu logo

GAN: The Great Automatic Nomenclator

The Next Million Names for Archaea and Bacteria, and the nomenclator Python package

Principle

To generate a large number of new names, we apply a combinatorial approach starting with two or three sets of curated roots, that are processed to produce all their possible combinations while keeping trace of their grammatical metadata to draft a valid etymology.

Gan flowchart

Installation

GAN is available on PyPI as gan-nomenclature and installs with Python 3.8+:

pip install gan-nomenclature

This command installs the library together with its dependencies (pandas, openpyxl, ...).

To work in an isolated environment, you can create one with conda and then install the package from PyPI:

conda create -c conda-forge -n gan python=3.10 pandas pip ipython
conda activate gan
pip install gan-nomenclature

Command-line tools

Installing the package provides a small suite of CLI helpers:

  • gan-genus: generate JSON/HTML/LaTeX outputs from two or three curated root tables.
  • gan-validate: validate the input Excel files for correct format and content.
  • gan-init: scaffold Excel templates (optionally populated with example rows) for use with gan-genus.
  • gan-aidraft: generate draft etymologies using OpenRouter-hosted LLMs starting from a text file used as context (e.g. a draft of a paper describing the biome where the new taxa were isolated).
  • xls2tsv: convert each worksheet of a workbook into a separate TSV file.
  • tsv2xls: convert TSV files back into Excel format.

Each command offers --help for additional options and usage examples.

Genera generator

A set of two (or three) Excel tables formatted as shown below is used to generate the list of combinations in JSON, HTML and LaTeX format.

Excel input format

Synopsis:

usage: gan-genus [-h] -1 FIRST -2 SECOND [-3 THIRD] -o OUTDIR [-p PREFIX] [-c CONNECTOR] [-v]

For full usage and installation instructions, please check the documentation.

Example output

Using three small files in the input_test directory (8, 11 and 8 words, respectively), GAN produced 968 (8 x 11 x 8)combinations:

Etymology

"The great automatic nomenclaturer" is a reference to a short story ("The Great Automatic Grammatizator") written by the British author Roald Dahl [link].

Citation

Mark J. Pallen et al. The Next Million Names for Archaea and Bacteria, Trends in Microbiology (2020). DOI: 10.1016/j.tim.2020.10.009

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nomenclator-1.1.2.tar.gz (64.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nomenclator-1.1.2-py3-none-any.whl (48.0 kB view details)

Uploaded Python 3

File details

Details for the file nomenclator-1.1.2.tar.gz.

File metadata

  • Download URL: nomenclator-1.1.2.tar.gz
  • Upload date:
  • Size: 64.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nomenclator-1.1.2.tar.gz
Algorithm Hash digest
SHA256 68fcfdce034e60b9625c4bb88e8b95548ee7b4d3249f47a3b1c200db4b180367
MD5 cb642c468c92483468a64160ba805de5
BLAKE2b-256 c82c5daace1b4f8bac28d6185a4e29e8c62cda7a7a12242de4ff477dc4bda2c3

See more details on using hashes here.

Provenance

The following attestation bundles were made for nomenclator-1.1.2.tar.gz:

Publisher: pypi-release.yml on telatin/gan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nomenclator-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: nomenclator-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 48.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nomenclator-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 265f201faf8590c573985160876e59bbc28669bb44727d5bd9b640bc0289375f
MD5 95091bd7d4a0fd6ae269deb6d09fd2e2
BLAKE2b-256 40edce741eb83e44d1d4c2aff365e471d54681695852a9e7537258347406873a

See more details on using hashes here.

Provenance

The following attestation bundles were made for nomenclator-1.1.2-py3-none-any.whl:

Publisher: pypi-release.yml on telatin/gan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page