Skip to main content

Package for Bulgarian Natural Language Processing (NLP)

Project description

bgnlp: Model-first approach to Bulgarian NLP

Open In Colab

Downloads

pip install bgnlp

Package functionalities

Part-of-speech (PoS) tagging

from bgnlp import pos


print(pos("Това е библиотека за обработка на естествен език."))
[{
    "word": "Това",
    "tag": "PDOsn",
    "bg_desc": "местоимение",
    "en_desc": "pronoun"
}, {
    "word": "е",
    "tag": "VLINr3s",
    "bg_desc": "глагол",
    "en_desc": "verb"
}, {
    "word": "библиотека",
    "tag": "NCFsof",
    "bg_desc": "съществително име",
    "en_desc": "noun"
}, {
    "word": "за",
    "tag": "R",
    "bg_desc": "предлог",
    "en_desc": "preposition"
}, {
    "word": "обработка",
    "tag": "NCFsof",
    "bg_desc": "съществително име",
    "en_desc": "noun"
}, {
    "word": "на",
    "tag": "R",
    "bg_desc": "предлог",
    "en_desc": "preposition"
}, {
    "word": "естествен",
    "tag": "Asmo",
    "bg_desc": "прилагателно име",
    "en_desc": "adjective"
}, {
    "word": "език",
    "tag": "NCMsom",
    "bg_desc": "съществително име",
    "en_desc": "noun"
}, {
    "word": ".",
    "tag": "U",
    "bg_desc": "препинателен знак",
    "en_desc": "punctuation"
}]

Lemmatization

from bgnlp import lemmatize


text = "Добре дошли!"
print(lemmatize(text))
[{'word': 'Добре', 'lemma': 'Добре'}, {'word': 'дошли', 'lemma': 'дойда'}, {'word': '!', 'lemma': '!'}]
# Generating a string of lemmas.
print(lemmatize(text, as_string=True))
Добре дойда!

Named Entity Recognition (NER) tagging

Currently, the available NER tags are:

  • PER - Person
  • ORG - Organization
  • LOC - Location
from bgnlp import ner


text = "Барух Спиноза е роден в Амстердам"

print(f"Input: {text}")
print("Result:", ner(text))
Input: Барух Спиноза е роден в Амстердам
Result: [{'word': 'Барух Спиноза', 'entity_group': 'PER'}, {'word': 'Амстердам', 'entity_group': 'LOC'}]

Using a Config object

A tagger Config is used to define the underlying model.

You can change the device on which it makes inference:

# Make inference using the GPU (by default it is "cpu"):
config = NerTaggerConfig(device="cuda")
ner = NerTagger(config=config)
# ...

You can also change the path to the model weights. For NerTagger you can directly pass the HuggingFace's Model Hub path. All other taggers use weights uploaded to Google Drive.

# Define the path to the model weights. It can be a single .pt file or a path to HuggingFace's Model Hub (only for NerTagger).
config = NerTaggerConfig(model_path="path/to/model")
ner = NerTagger(config=config)
# ...

Please, note that the model should be of the same architecture as the one used by the certain Tagger.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bgnlp-0.2.0.tar.gz (48.7 kB view hashes)

Uploaded Source

Built Distribution

bgnlp-0.2.0-py3-none-any.whl (47.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page