Skip to main content

OntoGene's Biomedical Entity Recogniser

Project description

OGER: OntoGene's Biomedical Entity Recogniser

A flexible, dictionary-based system for extracting biomedical terms from scientific articles.

Demo

A demo version of OGER is hosted at https://pub.cl.uzh.ch/purl/OGER.

Installation

Install OGER from its repository using pip:

pip install git+https://github.com/OntoGene/OGER.git

Make sure you use Python 3's pip (eg. pip3). Python 2.x is not supported.
Note: By default, pip installs Python packages at the system level, which typically requires root/admin privileges. To install OGER to a user-owned location, set the --user flag.

Usage: Command-line Tool

The installation process should provide you with an executable oger, which is the common starting point for a number of command-line tools. Type oger CMD, followed by command-specific options, to run the desired tool:

oger run       # run an annotation job
oger serve     # start a REST API server
oger eval      # determine annotation accurateness
oger test      # run software tests
oger version   # print the version number

To see a list of available options for each command, run eg. oger serve -h.

As an alternative to the oger executable, you may run python3 -m oger CMD.

Usage: Python Library

Get config and start a pipeline server:

>>> from oger.ctrl.router import Router, PipelineServer
>>> conf = Router(termlist_path='testfiles/test_terms.tsv')
>>> pl = PipelineServer(conf)

Note: test files can be downloaded here.

Load a text document from disk (using the included test suite):

>>> doc = pl.load_one('testfiles/txt/13373697.txt', 'txt')
>>> doc
<Article with 1 subelement at 0x7f8b2135d860>
>>> doc.text
'[The kind and the measure of ventilation disorders in tuberculous bronchostenosis in relation to its localization]. \n'

Download a collection of articles from PubMed:

>>> coll = pl.load_one(['21436587', '21436588'], fmt='pubmed')
>>> coll
<Collection with 2 subelements at 0x7f8b215f4cc0>
>>> coll[0]
<Article with 2 subelements at 0x7f8b2156a5f8>
>>> coll[0][0]
<Section with 1 subelement at 0x7f8b2156a358>
>>> coll[0][0].text
'Human prostate cancer metastases target the hematopoietic stem cell niche to establish footholds in mouse bone marrow.\n'

Run entity recognition:

>>> pl.process(coll)
>>> entity = next(coll[0].iter_entities())
>>> entity.text, entity.start, entity.end
('Human', 0, 5)
>>> entity.cid
'9606'
>>> entity.info
('organism', 'Homo sapiens', 'NCBI Taxonomy', '9606', 'DC')

Export to disk:

>>> with open('output/collection.json', 'w', encoding='utf8') as f:
...     pl.write(coll, 'bioc_json', f)

The second argument specifies the output format. OGER supports BioC (XML and JSON), PubTator, PubAnnotation JSON, BioNLP stand-off, and CoNLL, among others. A full list of available formats is given here (see the export-format parameter).

Documentation

Documentation is maintained in the GitHub wiki.

Prerequisites

OGER runs on Python 3.4+.

The following third-party libraries need to be installed (pip should take care of this):

Publications

If you use OGER in an academic context, please cite us:

Lenz Furrer, Anna Jancso, Nicola Colic, and Fabio Rinaldi (2019): OGER++: hybrid multi-type entity recognition. In: Journal of Cheminformatics 11:7. DOI: 10.1186/s13321-018-0326-3 | PDF | bibtex |

Lenz Furrer and Fabio Rinaldi (2017): OGER: OntoGene's Entity Recogniser in the BeCalm TIPS Task. In: Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, pp. 175–182. | PDF | bibtex |

Marco Basaldella, Lenz Furrer, Carlo Tasso, and Fabio Rinaldi (2017): Entity recognition in the biomedical domain using a hybrid approach. In: Journal of Biomedical Semantics 8:51. DOI: 10.1186/s13326-017-0157-6 | PDF | bibtex |

License

OGER offers a dual licensing model.

You can redistribute OGER and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

The GNU Affero General Public License is designed to ensure that if a modified version is distributed or made accessible on a server (e.g. in a SaaS offering), the modified source code becomes available to the community.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see http://www.gnu.org/licenses/.

If you wish to use OGER under alternate terms, you may obtain a commercial license to OGER. Please contact us for more information (http://www.ontogene.org).

Project details


Release history Release notifications | RSS feed

This version

1.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

OGER-1.5.tar.gz (679.6 kB view hashes)

Uploaded Source

Built Distribution

OGER-1.5-py3-none-any.whl (760.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page