Generate and apply coherent biomedical lexica
Project description
Biolexica
biolexica
helps generate and apply coherent biomedical lexica. It takes care of the following:
- Getting names and synonyms from a diverse set of inputs (ontologies, databases, custom)
using
pyobo
,bioontologies
,biosynonyms
, and more. - Merging equivalent terms to best take advantage of different synonyms for the same term from different sources
using
semra
. - Generating lexical index and doing NER using Gilda
Importantly, we pre-define lexica for several entity types that can be readily used with Gilda in
the lexica/
folder including:
- Cells and cell lines
- Diseases, conditions, and other phenotypes
- Anatomical terms, tissues, organ systems, etc.
Getting Started
Load a pre-defined grounder like this:
import biolexica
grounder = biolexica.load_grounder("phenotype")
>>> grounder.get_best_match("Alzheimer's disease")
Match(reference=Reference(prefix='doid', identifier='10652'), name="Alzheimer's disease", score=0.7778)
>>> grounder.annotate("Clinical trials for reducing Aβ levels in Alzheimer's disease have been controversial.")
[Annotation(text="Alzheimer's disease", start=42, end=61, match=Match(reference=Reference(prefix='doid', identifier='10652'), name="Alzheimer's disease", score=0.7339))]
Note: Biolexica constructs extended version of gilda.Grounder
that has convenience functions and a more
simple match data model encoded with Pydantic.
Search PubMed for abstracts and annotate them using a given grounder with:
import biolexica
from biolexica.literature import annotate_abstracts_from_search
grounder = biolexica.load_grounder("phenotype")
pubmed_query = "alzheimer's disease"
annotations = annotate_abstracts_from_search(pubmed_query, grounder=grounder, limit=30)
🚀 Installation
The most recent release can be installed from PyPI with:
pip install biolexica
The most recent code and data can be installed directly from GitHub with:
pip install git+https://github.com/biopragmatics/biolexica.git
👐 Contributing
Contributions, whether filing an issue, making a pull request, or forking, are appreciated. See CONTRIBUTING.md for more information on getting involved.
👋 Attribution
⚖️ License
The code in this package is licensed under the MIT License.
🍪 Cookiecutter
This package was created with @audreyfeldroy's cookiecutter package using @cthoyt's cookiecutter-snekpack template.
🛠️ For Developers
See developer instructions
The final section of the README is for if you want to get involved by making a code contribution.
Development Installation
To install in development mode, use the following:
git clone git+https://github.com/biopragmatics/biolexica.git
cd biolexica
pip install -e .
🥼 Testing
After cloning the repository and installing tox
with pip install tox
, the unit tests in the tests/
folder can be
run reproducibly with:
tox
Additionally, these tests are automatically re-run with each commit in a GitHub Action.
📖 Building the Documentation
The documentation can be built locally using the following:
git clone git+https://github.com/biopragmatics/biolexica.git
cd biolexica
tox -e docs
open docs/build/html/index.html
The documentation automatically installs the package as well as the docs
extra specified in the setup.cfg
. sphinx
plugins
like texext
can be added there. Additionally, they need to be added to the
extensions
list in docs/source/conf.py
.
The documentation can be deployed to ReadTheDocs using
this guide.
The .readthedocs.yml
YAML file contains all the configuration you'll need.
You can also set up continuous integration on GitHub to check not only that
Sphinx can build the documentation in an isolated environment (i.e., with tox -e docs-test
)
but also that ReadTheDocs can build it too.
📦 Making a Release
After installing the package in development mode and installing
tox
with pip install tox
, the commands for making a new release are contained within the finish
environment
in tox.ini
. Run the following from the shell:
tox -e finish
This script does the following:
- Uses Bump2Version to switch the version number in the
setup.cfg
,src/biolexica/version.py
, anddocs/source/conf.py
to not have the-dev
suffix - Packages the code in both a tar archive and a wheel using
build
- Uploads to PyPI using
twine
. Be sure to have a.pypirc
file configured to avoid the need for manual input at this step - Push to GitHub. You'll need to make a release going with the commit where the version was bumped.
- Bump the version to the next patch. If you made big changes and want to bump the version by minor, you can
use
tox -e bumpversion -- minor
after.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file biolexica-0.0.4.tar.gz
.
File metadata
- Download URL: biolexica-0.0.4.tar.gz
- Upload date:
- Size: 23.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab7995a890df479f043a6eff260ca7b23b17711229b3192311cfbd7526ee6e7b |
|
MD5 | a8b9076967652c690a58f936dda5bb49 |
|
BLAKE2b-256 | 1e9d331290208c004fb2c913d8437d22caf63cbb96dd155aa3676ae3563af8dd |
File details
Details for the file biolexica-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: biolexica-0.0.4-py3-none-any.whl
- Upload date:
- Size: 17.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 900f4c2b4733f5ecea82cf90694474988b2054ae44c095c214ba6d460080b43f |
|
MD5 | 445ce89fe23fdbf3c9f515005dd80543 |
|
BLAKE2b-256 | 8c22d8cc893a8f81a2b2a61ea163112c9770da08264b9fb1aff0d30545da7340 |