Skip to main content

A deep neural model for taxonomic entity recognition

Project description

Looking for taxon mentions in text? Ask TaxoNERD

Features

TaxoNERD is a domain-specific tool for recognizing taxon mentions in the biodiversity literature.

  • Based on the en_core_sci_md model from scispaCy, fine-tuned on an ecological corpus
  • Find scientific names, common names and user-defined abbreviations
  • Lightning-fast on CPU (once the model is loaded), can use GPU to speed-up the recognition process
  • Available as a command-line tool and a python module

Installation

$ pip install taxonerd
$ pip install https://github.com/nleguillarme/taxonerd/releases/download/v0.1.1/en_ner_eco_md-0.1.1.tar.gz

Usage

Use as command-line tool

Usage: taxonerd ask [OPTIONS] [INPUT_TEXT]

Options:
  -m, --model TEXT       A spaCy taxonomic NER model
  -i, --input-dir TEXT   Input directory
  -o, --output-dir TEXT  Output directory
  -f, --filename TEXT    Input text file
  -a, --with-abbrev      Add abbreviation detector to the pipeline
  --gpu                  Use GPU if available
  -v, --verbose          Verbose mode
  --help                 Show this message and exit.

Examples

Taxonomic NER from the terminal
$ taxonerd ask "Brown bears (Ursus arctos), which are widely distributed throughout the northern hemisphere, are recognised as opportunistic omnivores"
T0	LIVB 0 11	Brown bears
T1	LIVB 13 25	Ursus arctos
Taxonomic NER from a text file (with abbreviation detection)
$ taxonerd ask --with-abbrev -f test_txt/sample_text1.txt
T0	LIVB 4 21	pinewood nematode
T1	LIVB 23 26	PWN
T2	LIVB 29 55	Bursaphelenchus xylophilus
T3	LIVB 180 188	Serratia
T4	LIVB 326 348	Serratia grimesii BXF1
T5	LIVB 424 428	BXF1
T6	LIVB 241 244	PWN;pinewood nematode
T7	LIVB 371 374	PWN;pinewood nematode
T8	LIVB 23 26	PWN;pinewood nematode
Taxonomic NER from a directory containing text files, with results written in the output directory
$ taxonerd ask -i test_txt -o test_ann
$ ls test_ann/
sample_text1.ann  sample_text2.ann
$ cat test_ann/sample_text2.ann
T0	LIVB 700 711	Brown bears
T1	LIVB 713 725	Ursus arctos
T2	LIVB 1906 1912	salmon
T3	LIVB 1974 1980	salmon
T4	LIVB 2123 2129	salmon
T5	LIVB 2392 2401	Sika deer
T6	LIVB 2403 2416	Cervus nippon
T7	LIVB 3135 3141	salmon
T8	LIVB 3146 3150	deer
T9	LIVB 3188 3199	chum salmon
T10	LIVB 3201 3218	Oncorhynchus keta
T11	LIVB 3280 3289	Sika deer
T12	LIVB 3363 3375	O. gorbuscha
T13	LIVB 3381 3392	chum salmon

Use as python module

>>> from taxonerd import TaxoNERD
>>> ner = TaxoNERD(model="en_ner_eco_md", with_gpu=False, with_abbrev=False)

Examples

Find taxonomic entities in an input string
>>> ner.find_entities("Brown bears (Ursus arctos), which are widely distributed throughout the northern hemisphere, are recognised as opportunistic omnivore")
       offsets          text
T0   LIVB 0 11   Brown bears
T1  LIVB 13 25  Ursus arctos
Find taxonomic entities in an input file
>>> ner.find_in_file("./test_txt/sample_text1.txt", output_dir=None)
T0	LIVB 4 21	pinewood nematode
T1	LIVB 23 26	PWN
T2	LIVB 29 55	Bursaphelenchus xylophilus
T3	LIVB 180 188	Serratia
T4	LIVB 326 348	Serratia grimesii BXF1
T5	LIVB 424 428	BXF1
Find taxonomic entities in all the files in the input directory, and write the results in the output directory
>>> ner.find_all_files("./test_txt", "./test_ann")

License

License: MIT

Authors

TaxoNERD was written by nleguillarme.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxonerd-0.1.1.tar.gz (4.7 kB view details)

Uploaded Source

File details

Details for the file taxonerd-0.1.1.tar.gz.

File metadata

  • Download URL: taxonerd-0.1.1.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.5

File hashes

Hashes for taxonerd-0.1.1.tar.gz
Algorithm Hash digest
SHA256 24cb8167fd96b61cbaa412fa0bf0e7c20bf05955592855eb351f1d6692f82f08
MD5 e99ca6e5df438f0ac8da17e023edc3b5
BLAKE2b-256 da4642ab11b21ca5f7d0cf6ad8a1f2968ec81581ff89cc8fdaab0c8b0535d254

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page