Package for Bulgarian Natural Language Processing (NLP)
Project description
bgnlp: Model-first approach to Bulgarian NLP
pip install bgnlp
Package functionalities
Please note - only the first time you run one of these operations a model will be downloaded! Therefore, the first run might take more time.
Part-of-speech (PoS) tagging
from bgnlp import pos
print(pos("Това е библиотека за обработка на естествен език."))
[{
"word": "Това",
"tag": "PDOsn",
"bg_desc": "местоимение",
"en_desc": "pronoun"
}, {
"word": "е",
"tag": "VLINr3s",
"bg_desc": "глагол",
"en_desc": "verb"
}, {
"word": "библиотека",
"tag": "NCFsof",
"bg_desc": "съществително име",
"en_desc": "noun"
}, {
"word": "за",
"tag": "R",
"bg_desc": "предлог",
"en_desc": "preposition"
}, {
"word": "обработка",
"tag": "NCFsof",
"bg_desc": "съществително име",
"en_desc": "noun"
}, {
"word": "на",
"tag": "R",
"bg_desc": "предлог",
"en_desc": "preposition"
}, {
"word": "естествен",
"tag": "Asmo",
"bg_desc": "прилагателно име",
"en_desc": "adjective"
}, {
"word": "език",
"tag": "NCMsom",
"bg_desc": "съществително име",
"en_desc": "noun"
}, {
"word": ".",
"tag": "U",
"bg_desc": "препинателен знак",
"en_desc": "punctuation"
}]
Lemmatization
from bgnlp import lemmatize
text = "Добре дошли!"
print(lemmatize(text))
[{'word': 'Добре', 'lemma': 'Добре'}, {'word': 'дошли', 'lemma': 'дойда'}, {'word': '!', 'lemma': '!'}]
# Generating a string of lemmas.
print(lemmatize(text, as_string=True))
Добре дойда!
Named Entity Recognition (NER) tagging
Currently, the available NER tags are:
PER
- PersonORG
- OrganizationLOC
- Location
from bgnlp import ner
text = "Барух Спиноза е роден в Амстердам"
print(f"Input: {text}")
print("Result:", ner(text))
Input: Барух Спиноза е роден в Амстердам
Result: [{'word': 'Барух Спиноза', 'entity_group': 'PER'}, {'word': 'Амстердам', 'entity_group': 'LOC'}]
Keyword Extraction
from bgnlp import extract_keywords
# Reading the text from a file, since it may be large, hence it wouldn't be
# pleasant to write it directly here.
# The current input is this Bulgarian news article (only the text, no HTML!):
# https://novini.bg/sviat/eu/781622
with open("input_text.txt", "r", encoding="utf-8") as f:
text = f.read()
# Extracting keywords with probability of at least 0.5.
keywords = extract_keywords(text, threshold=0.5)
print("Keywords:")
pprint(keywords)
Keywords:
[{'keyword': 'Еманюел Макрон', 'score': 0.8759163320064545},
{'keyword': 'Г-7', 'score': 0.5938143730163574},
{'keyword': 'Япония', 'score': 0.607077419757843}]
Commatization
from pprint import pprint
from bgnlp import commatize
text = "Човекът искащ безгрижно писане ме помоли да създам този модел."
print("Without metadata:")
print(commatize(text))
print("\nWith metadata:")
pprint(commatize(text, return_metadata=True))
Without metadata:
Човекът, искащ безгрижно писане, ме помоли да създам този модел.
With metadata:
('Човекът, искащ безгрижно писане, ме помоли да създам този модел.',
[{'end': 12,
'score': 0.9301406145095825,
'start': 0,
'substring': 'Човекът, иск'},
{'end': 34,
'score': 0.93571537733078,
'start': 24,
'substring': ' писане, м'}])
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bgnlp-0.5.3.tar.gz
(52.0 kB
view details)
Built Distribution
bgnlp-0.5.3-py3-none-any.whl
(50.9 kB
view details)
File details
Details for the file bgnlp-0.5.3.tar.gz
.
File metadata
- Download URL: bgnlp-0.5.3.tar.gz
- Upload date:
- Size: 52.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 96e67221583538fb013fa7e6ae6f585ce89f4e1b191cc84c015116518b1581db |
|
MD5 | 0e9056a1004147a27bcdf8fba0c183f6 |
|
BLAKE2b-256 | f631f50bd56638760e395c5998696190403fcca6370845ee3678f773d755e6eb |
File details
Details for the file bgnlp-0.5.3-py3-none-any.whl
.
File metadata
- Download URL: bgnlp-0.5.3-py3-none-any.whl
- Upload date:
- Size: 50.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d9e82108bffe74e1ffa1d5f22f6ea2c5372ef22da32587ff6bb765be3212fc6 |
|
MD5 | 9adc7dea55413e353f011d9f80cb78e1 |
|
BLAKE2b-256 | 0f1e4da61a314656bceff8d3b348a0e898a1fe1ab50fc21d2ba0a846423485b7 |