Skip to main content

A library to parse taxon names into elementary components

Project description

Taxon parser

This library is a pure Python adaptation of the GBIF Java name-parser library.

It is used to parse any taxonomic name into its elementary components (genus, species, authors...)

It returns a ParsedName object containing the name parts and the original parsed string.


pip install taxon_parser


This library works for Python version 3.4 or higher

The only external package dependancy is the regex package for handling advanced regular expressions.


basic usage:

from taxon_parser import TaxonParser, UnparsableNameException

parser = TaxonParser("Abies pectinata mill.")
    parsed_name = parser.parse()
except UnparsableNameException as e:
    print("this name does not seem to be a valid taxon name: \n" + e )

For any name that can not be parsed, an UnparsableNameException is thrown.

ParsedName objects

Results of the parsing are ParsedName objects. These have the following attributes and functions:

  • ParsedName.combinationAuthorship: Authorship object with years of the name, but excluding any basionym authorship. For binomials the combination authors.

  • ParsedName.basionymAuthorship: basionym Authorship object with years of the name

  • ParsedName.sanctioningAuthor: The sanctioning author for sanctioned fungal names. Fr. or Pers.

  • ParsedName.rank: Rank of the name from enumeration see the for description of possible ranks

  • ParsedName.uninomial: Represents the monomial for genus, families or names at higher ranks which do not have further epithets.

  • ParsedName.genus: The genus part of an infrageneric, bi- or trinomial name. Not used for standalone genus names which are represented as uninomials.

Infrageneric epithets:

  • ParsedName.infragenericEpithet

  • ParsedName.specificEpithet

  • ParsedName.infraspecificEpithet

  • ParsedName.cultivarEpithet

  • ParsedName.strain

  • ParsedName.candidatus: A bacterial candidate name. Candidatus is a provisional status for incompletely described procaryotes (e.g. that cannot be maintained in a Bacteriology Culture Collection) which was published in the January 1995. The category Candidatus is not covered by the Rules of the Bacteriological Code but is a taxonomic assignment. The names included in the category Candidatus are usually written as follows: Candidatus (in italics), the subsequent name(s) in roman type and the entire name in quotation marks. For example, "Candidatus Phytoplasma", "Candidatus Phytoplasma allocasuarinae". See and

  • ParsedName.notho: The part of the named hybrid which is considered a hybrid

  • ParsedName.taxonomicNotes: Nomenclatural status remarks of the name.

  • ParsedName.remarks: Any informal remarks found in the name

  • ParsedName.unparsed: Any additional unparsed string found at the end of the name. Only ever set when state=PARTIAL

  • ParsedName.type: The kind of name classified in broad catagories based on their syntactical structure

  • ParsedName.doubtful: Indicates some doubts that this is a name of the given type. Usually indicates the existance of unusual characters not normally found in scientific names.

  • ParsedName.state: Indicates if the full name has been parsed

  • ParsedName.getTerminalEpithet(): returns the terminal epithet. Infraspecific epithet if existing, the species epithet or null

  • ParsedName.hasName(): return True if the parsed name has non null name properties or a scientific name. Remarks will not count as a name

  • ParsedName.hasAuthorship(): return True if any kind of authorship exists

  • ParsedName.isBinomial(): return True if the name is a bi- or trinomial with at least a genus and species epithet given.

  • ParsedName.isTrinomial(): return true if the name is a trinomial with at least a genus, species and infraspecific epithet given.

  • ParsedName.isIndetermined(): Checks if a parsed name is missing final epithets compared to what is indicated by its rank. Returns True if the name is not fully determined.

  • ParsedName.isIncomplete(): returns True if some "higher" epithet of a name is missing, e.g. the genus in case of a species.

  • ParsedName.isAbbreviated(): returns True if the name contains an abbreviated genus or uninomial.

  • ParsedName.canonicalName(): A full scientific name with authorship from the individual properties in its canonical form. Autonyms are rendered without authorship and subspecies are using the subsp rank marker unless a name is assigned to the zoological code.

  • ParsedName.canonicalName(): A full scientific name just as canonicalName, but without any authorship.

  • ParsedName.canonicalComplete(): Assembles a full name with all details including non code compliant, informal remarks.

  • ParsedName.nameMinimal(): returns a minimal canonical name with nothing else but the 3 main name parts (genus, species, infraspecific). No rank or hybrid markers and no authorship, cultivar or strain information is rendered. Infrageneric names are represented without a leading genus. Unicode characters will be replaced by their matching ASCII characters. For example:

    • Abies alba
    • Abies alba alpina
    • Bracteata
  • ParsedName.authorshipComplete(): returns the full concatenated authorship for parsed names including the sanctioning author.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxon_parser-0.2.1.tar.gz (33.4 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page