Skip to main content

Package for Bulgarian Natural Language Processing (NLP)

Project description

bgnlp: Model-first approach to Bulgarian NLP

Open In Colab

Downloads

pip install bgnlp

Package functionalities

Please note - only the first time you run one of these operations a model will be downloaded! Therefore, the first run might take more time.

Part-of-speech (PoS) tagging

from bgnlp import pos


print(pos("Това е библиотека за обработка на естествен език."))
[{
    "word": "Това",
    "tag": "PDOsn",
    "bg_desc": "местоимение",
    "en_desc": "pronoun"
}, {
    "word": "е",
    "tag": "VLINr3s",
    "bg_desc": "глагол",
    "en_desc": "verb"
}, {
    "word": "библиотека",
    "tag": "NCFsof",
    "bg_desc": "съществително име",
    "en_desc": "noun"
}, {
    "word": "за",
    "tag": "R",
    "bg_desc": "предлог",
    "en_desc": "preposition"
}, {
    "word": "обработка",
    "tag": "NCFsof",
    "bg_desc": "съществително име",
    "en_desc": "noun"
}, {
    "word": "на",
    "tag": "R",
    "bg_desc": "предлог",
    "en_desc": "preposition"
}, {
    "word": "естествен",
    "tag": "Asmo",
    "bg_desc": "прилагателно име",
    "en_desc": "adjective"
}, {
    "word": "език",
    "tag": "NCMsom",
    "bg_desc": "съществително име",
    "en_desc": "noun"
}, {
    "word": ".",
    "tag": "U",
    "bg_desc": "препинателен знак",
    "en_desc": "punctuation"
}]

Lemmatization

from bgnlp import lemmatize


text = "Добре дошли!"
print(lemmatize(text))
[{'word': 'Добре', 'lemma': 'Добре'}, {'word': 'дошли', 'lemma': 'дойда'}, {'word': '!', 'lemma': '!'}]
# Generating a string of lemmas.
print(lemmatize(text, as_string=True))
Добре дойда!

Named Entity Recognition (NER) tagging

Currently, the available NER tags are:

  • PER - Person
  • ORG - Organization
  • LOC - Location
from bgnlp import ner


text = "Барух Спиноза е роден в Амстердам"

print(f"Input: {text}")
print("Result:", ner(text))
Input: Барух Спиноза е роден в Амстердам
Result: [{'word': 'Барух Спиноза', 'entity_group': 'PER'}, {'word': 'Амстердам', 'entity_group': 'LOC'}]

Keyword Extraction

from bgnlp import extract_keywords


# Reading the text from a file, since it may be large, hence it wouldn't be 
# pleasant to write it directly here.
# The current input is this Bulgarian news article (only the text, no HTML!):
# https://novini.bg/sviat/eu/781622
with open("input_text.txt", "r", encoding="utf-8") as f:
    text = f.read()

# Extracting keywords with probability of at least 0.5.
keywords = extract_keywords(text, threshold=0.5)
print("Keywords:")
pprint(keywords)
Keywords:
[{'keyword': 'Еманюел Макрон', 'score': 0.8759163320064545},
 {'keyword': 'Г-7', 'score': 0.5938143730163574},
 {'keyword': 'Япония', 'score': 0.607077419757843}]

Commatization

from pprint import pprint

from bgnlp import commatize


text = "Човекът искащ безгрижно писане ме помоли да създам този модел."

print("Without metadata:")
print(commatize(text))

print("\nWith metadata:")
pprint(commatize(text, return_metadata=True))
Without metadata:
Човекът, искащ безгрижно писане, ме помоли да създам този модел.

With metadata:
('Човекът, искащ безгрижно писане, ме помоли да създам този модел.',
 [{'end': 12,
   'score': 0.9301406145095825,
   'start': 0,
   'substring': 'Човекът, иск'},
  {'end': 34,
   'score': 0.93571537733078,
   'start': 24,
   'substring': ' писане, м'}])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bgnlp-0.5.3.tar.gz (52.0 kB view details)

Uploaded Source

Built Distribution

bgnlp-0.5.3-py3-none-any.whl (50.9 kB view details)

Uploaded Python 3

File details

Details for the file bgnlp-0.5.3.tar.gz.

File metadata

  • Download URL: bgnlp-0.5.3.tar.gz
  • Upload date:
  • Size: 52.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for bgnlp-0.5.3.tar.gz
Algorithm Hash digest
SHA256 96e67221583538fb013fa7e6ae6f585ce89f4e1b191cc84c015116518b1581db
MD5 0e9056a1004147a27bcdf8fba0c183f6
BLAKE2b-256 f631f50bd56638760e395c5998696190403fcca6370845ee3678f773d755e6eb

See more details on using hashes here.

File details

Details for the file bgnlp-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: bgnlp-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 50.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for bgnlp-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7d9e82108bffe74e1ffa1d5f22f6ea2c5372ef22da32587ff6bb765be3212fc6
MD5 9adc7dea55413e353f011d9f80cb78e1
BLAKE2b-256 0f1e4da61a314656bceff8d3b348a0e898a1fe1ab50fc21d2ba0a846423485b7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page