KeyCARE is a Python library designed for the unsupervised keyword extraction from biomedical documents with the use of different algorithms, the classification of the keywords according to their semantic nature, and the extraction of is a relations among those keywords and with other terminologies.
Project description
KeyCARE
A framework for biomedical Keyword Extraction, term Categorization, and semantic Relation.
Table of Contents
1. About The Project
KeyBERT provides a common interface for extracting, categorizing and associating terms extracted from a text:
- Keywords extraction: KeyCARE implements several unsupervised term extraction techniques such as YAKE, RAKE, TextRank or KeyBERT to automatically extract key terms from a text.
- Term categorization: KeyCARE allows the application of term clustering techniques to group similar terms, as well as the training and application of supervised techniques to classify keywords into predefined categories, including SetFit.
- Semantic relation classification: Beyond the identification and categorization of terms, the library supports the use of neural classification models, such as the Transformer's AutoModelForSequenceClassification, to extract the semantic relation between two terms by means of EXACT, BROAD, NARROW and NO_RELATION relationships, which allows interconnecting the extracted terms and can be used for terminological enrichment, among other tasks.
2. Getting Started
2.1. Installation
Installation can be done using pypi:
pip install keycare
2.2. Usage
The library is built on 3 main processes: keyword extraction, term categorization and relations extraction. The two first processes have been implemented within a same pipeline in the class TermExtractor
, which automatically extracts classified keywords frim pieces of text. The relations extraction process among term pairs or groups of terms is implemented in the other main class, RelExtractor
.
TermExtractor
For the use of TermExtractor with default parameters, use the following code:
from keycare.TermExtractor import TermExtractor
extractor = TermExtractor()
extractor("...") # Introduce your text here
extractor.keywords
This code calls TermExtractor with default parameters on a piece of text and returns the extracted keywords with their assigned class.
RelExtractor
For the use of RelExtractor with default parameters, use the following code:
from keycare.RelExtractor import RelExtractor
relextractor = RelExtractor()
relextractor("...", "...") # Introduce your term pairs here
relextractor.relations
This code calls RelExtractor with default parameters on pairs of terms and returns the existing relation among them.
For further information on the functioning of the library and the available parameters refer to the tutorials in the nbs folder.
3. Contributing
This library has been developed with Python 3.10.12
Any contributions you make are greatly appreciated. For contributing:
-
Fork/Clone the Project in your system
git clone https://github.com/nlp4bia-bsc/keycare.git
-
Create a new virtual environment
python3 -m venv .env_keycare
-
Activate the new environment
source .env_keycare/bin/activate
-
Install the requirements
pip install -r requirements.txt
-
Create your Feature Branch (
git checkout -b feature/AmazingFeature
) -
Update requirements file (
pip freeze > requirements.txt
) -
Commit your Changes (
git commit -m 'Add some AmazingFeature'
) -
Push to the Branch (
git push origin feature/AmazingFeature
) -
Open a Pull Request from github.
Follow this tutorial to create a branch.
4. License
5. References
A paper on the library will soon be published. Please cite if you use the library in scientific works.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.