OntoGene's Biomedical Entity Recogniser
Project description
OGER: OntoGene's Biomedical Entity Recogniser
A flexible, dictionary-based system for extracting biomedical terms from scientific articles.
Demo
A demo version of OGER is hosted at https://pub.cl.uzh.ch/purl/OGER.
Installation
Install OGER from its repository using pip:
pip install git+https://github.com/OntoGene/OGER.git
Make sure you use Python 3's pip (eg. pip3
).
Python 2.x is not supported.
Note: By default, pip
installs Python packages at the system level, which typically requires root/admin privileges.
To install OGER to a user-owned location, set the --user
flag.
Usage: Command-line Tool
The installation process should provide you with an executable oger
, which is the common starting point for a number of command-line tools.
Type oger
CMD
, followed by command-specific options, to run the desired tool:
oger run # run an annotation job
oger serve # start a REST API server
oger eval # determine annotation accurateness
oger test # run software tests
oger version # print the version number
To see a list of available options for each command, run eg. oger serve -h
.
As an alternative to the oger
executable, you may run python3 -m oger
CMD
.
Usage: Python Library
Get config and start a pipeline server:
>>> from oger.ctrl.router import Router, PipelineServer
>>> conf = Router(termlist_path='testfiles/test_terms.tsv')
>>> pl = PipelineServer(conf)
Note: test files can be downloaded here.
Load a text document from disk (using the included test suite):
>>> doc = pl.load_one('testfiles/txt/13373697.txt', 'txt')
>>> doc
<Article with 1 subelement at 0x7f8b2135d860>
>>> doc.text
'[The kind and the measure of ventilation disorders in tuberculous bronchostenosis in relation to its localization]. \n'
Download a collection of articles from PubMed:
>>> coll = pl.load_one(['21436587', '21436588'], fmt='pubmed')
>>> coll
<Collection with 2 subelements at 0x7f8b215f4cc0>
>>> coll[0]
<Article with 2 subelements at 0x7f8b2156a5f8>
>>> coll[0][0]
<Section with 1 subelement at 0x7f8b2156a358>
>>> coll[0][0].text
'Human prostate cancer metastases target the hematopoietic stem cell niche to establish footholds in mouse bone marrow.\n'
Run entity recognition:
>>> pl.process(coll)
>>> entity = next(coll[0].iter_entities())
>>> entity.text, entity.start, entity.end
('Human', 0, 5)
>>> entity.cid
'9606'
>>> entity.info
('organism', 'Homo sapiens', 'NCBI Taxonomy', '9606', 'DC')
Export to disk:
>>> with open('output/collection.json', 'w', encoding='utf8') as f:
... pl.write(coll, 'bioc_json', f)
The second argument specifies the output format.
OGER supports BioC (XML and JSON), PubTator, PubAnnotation JSON, BioNLP stand-off, and CoNLL, among others.
A full list of available formats is given here (see the export-format
parameter).
Documentation
Documentation is maintained in the GitHub wiki.
Prerequisites
OGER runs on Python 3.4+.
The following third-party libraries need to be installed (pip should take care of this):
Publications
If you use OGER in an academic context, please cite us:
Lenz Furrer, Anna Jancso, Nicola Colic, and Fabio Rinaldi (2019): OGER++: hybrid multi-type entity recognition. In: Journal of Cheminformatics 11:7. DOI: 10.1186/s13321-018-0326-3 | PDF | bibtex |
Lenz Furrer and Fabio Rinaldi (2017): OGER: OntoGene's Entity Recogniser in the BeCalm TIPS Task. In: Proceedings of the BioCreative V.5 Challenge Evaluation Workshop, pp. 175–182. | PDF | bibtex |
Marco Basaldella, Lenz Furrer, Carlo Tasso, and Fabio Rinaldi (2017): Entity recognition in the biomedical domain using a hybrid approach. In: Journal of Biomedical Semantics 8:51. DOI: 10.1186/s13326-017-0157-6 | PDF | bibtex |
License
OGER offers a dual licensing model.
You can redistribute OGER and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
The GNU Affero General Public License is designed to ensure that if a modified version is distributed or made accessible on a server (e.g. in a SaaS offering), the modified source code becomes available to the community.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see http://www.gnu.org/licenses/.
If you wish to use OGER under alternate terms, you may obtain a commercial license to OGER. Please contact us for more information (http://www.ontogene.org).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.