Package for Bulgarian Natural Language Processing (NLP)
Project description
bgnlp: Model-first approach to Bulgarian NLP
pip install bgnlp
Package functionalities
Part-of-speech (PoS) tagging
from bgnlp import pos
print(pos("Това е библиотека за обработка на естествен език."))
[{
"word": "Това",
"tag": "PDOsn",
"bg_desc": "местоимение",
"en_desc": "pronoun"
}, {
"word": "е",
"tag": "VLINr3s",
"bg_desc": "глагол",
"en_desc": "verb"
}, {
"word": "библиотека",
"tag": "NCFsof",
"bg_desc": "съществително име",
"en_desc": "noun"
}, {
"word": "за",
"tag": "R",
"bg_desc": "предлог",
"en_desc": "preposition"
}, {
"word": "обработка",
"tag": "NCFsof",
"bg_desc": "съществително име",
"en_desc": "noun"
}, {
"word": "на",
"tag": "R",
"bg_desc": "предлог",
"en_desc": "preposition"
}, {
"word": "естествен",
"tag": "Asmo",
"bg_desc": "прилагателно име",
"en_desc": "adjective"
}, {
"word": "език",
"tag": "NCMsom",
"bg_desc": "съществително име",
"en_desc": "noun"
}, {
"word": ".",
"tag": "U",
"bg_desc": "препинателен знак",
"en_desc": "punctuation"
}]
Lemmatization
from bgnlp import lemmatize
text = "Добре дошли!"
print(lemmatize(text))
[{'word': 'Добре', 'lemma': 'Добре'}, {'word': 'дошли', 'lemma': 'дойда'}, {'word': '!', 'lemma': '!'}]
# Generating a string of lemmas.
print(lemmatize(text, as_string=True))
Добре дойда!
Named Entity Recognition (NER) tagging
Currently, the available NER tags are:
PER
- PersonORG
- OrganizationLOC
- Location
from bgnlp import ner
text = "Барух Спиноза е роден в Амстердам"
print(f"Input: {text}")
print("Result:", ner(text))
Input: Барух Спиноза е роден в Амстердам
Result: [{'word': 'Барух Спиноза', 'entity_group': 'PER'}, {'word': 'Амстердам', 'entity_group': 'LOC'}]
Using a Config object
A tagger Config is used to define the underlying model.
You can change the device on which it makes inference:
# Make inference using the GPU (by default it is "cpu"):
config = NerTaggerConfig(device="cuda")
ner = NerTagger(config=config)
# ...
You can also change the path to the model weights. For NerTagger
you can directly pass the HuggingFace's Model Hub path. All other taggers use weights uploaded to Google Drive.
# Define the path to the model weights. It can be a single .pt file or a path to HuggingFace's Model Hub (only for NerTagger).
config = NerTaggerConfig(model_path="path/to/model")
ner = NerTagger(config=config)
# ...
Please, note that the model should be of the same architecture as the one used by the certain Tagger.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bgnlp-0.2.0.tar.gz
(48.7 kB
view hashes)
Built Distribution
bgnlp-0.2.0-py3-none-any.whl
(47.9 kB
view hashes)