Skip to main content

Generate bacterial nomenclature Latin names using a combinatorial approach, as decribed in Pallen et al. (2021) 'The Next Million Names for Archaea and Bacteria'

Project description

Nomenclator logo

GAN: The Great Automatic Nomenclator

The Next Million Names for Archaea and Bacteria, and the nomenclator Python package

Principle

To generate a large number of new names, we apply a combinatorial approach starting with two or three sets of curated roots, that are processed to produce all their possible combinations while keeping trace of their grammatical metadata to draft a valid etymology.

Gan flowchart

Installation

GAN is available on PyPI as gan-nomenclature and installs with Python 3.8+:

pip install gan-nomenclature

This command installs the library together with its dependencies (pandas, openpyxl, ...).

To work in an isolated environment, you can create one with conda and then install the package from PyPI:

conda create -c conda-forge -n gan python=3.10 pandas pip ipython
conda activate gan
pip install gan-nomenclature

Command-line tools

Installing the package provides a small suite of CLI helpers:

  • gan-genus: generate JSON/HTML/LaTeX outputs from two or three curated root tables.
  • gan-validate: validate the input Excel files for correct format and content.
  • gan-init: scaffold Excel templates (optionally populated with example rows) for use with gan-genus.
  • gan-aidraft: generate draft etymologies using OpenRouter-hosted LLMs starting from a text file used as context (e.g. a draft of a paper describing the biome where the new taxa were isolated).
  • xls2tsv: convert each worksheet of a workbook into a separate TSV file.
  • tsv2xls: convert TSV files back into Excel format.

Each command offers --help for additional options and usage examples.

Genera generator

A set of two (or three) Excel tables formatted as shown below is used to generate the list of combinations in JSON, HTML and LaTeX format.

Excel input format

Synopsis:

usage: gan-genus [-h] -1 FIRST -2 SECOND [-3 THIRD] -o OUTDIR [-p PREFIX] [-c CONNECTOR] [-v]

For full usage and installation instructions, please check the documentation.

Example output

Using three small files in the input_test directory (8, 11 and 8 words, respectively), GAN produced 968 (8 x 11 x 8)combinations:

Etymology

"The great automatic nomenclaturer" is a reference to a short story ("The Great Automatic Grammatizator") written by the British author Roald Dahl [link].

Citation

Mark J. Pallen et al. The Next Million Names for Archaea and Bacteria, Trends in Microbiology (2020). DOI: 10.1016/j.tim.2020.10.009

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nomenclator-1.2.0.tar.gz (64.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nomenclator-1.2.0-py3-none-any.whl (48.1 kB view details)

Uploaded Python 3

File details

Details for the file nomenclator-1.2.0.tar.gz.

File metadata

  • Download URL: nomenclator-1.2.0.tar.gz
  • Upload date:
  • Size: 64.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nomenclator-1.2.0.tar.gz
Algorithm Hash digest
SHA256 4fe7677f734114062e21ab2c83e3d88664800109e351a99ad4768c75fa9992a7
MD5 6792f43a0209ffab4d4d99fa60b63e51
BLAKE2b-256 9390a05a48ad49285947a6c197e08b3af25a7af2e7980625c15295cf0f237ea8

See more details on using hashes here.

Provenance

The following attestation bundles were made for nomenclator-1.2.0.tar.gz:

Publisher: pypi-release.yml on telatin/gan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file nomenclator-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: nomenclator-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 48.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for nomenclator-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8c56086e680f58aa7cd3d8ded2afdc8be274a199aa160143891d823e9ec2c9fc
MD5 e4ce8bfda4b868193673816bacdd5027
BLAKE2b-256 10c83e6be5e73a9cfea5aeb8d2a11bc5a959e27519da6f4b2ca5e1c3d0537dfb

See more details on using hashes here.

Provenance

The following attestation bundles were made for nomenclator-1.2.0-py3-none-any.whl:

Publisher: pypi-release.yml on telatin/gan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page