Skip to main content

Extract keywords for vietnamese text.

Project description

Welcome to Keywords Extractor 🐣

This is a simple library for extracting keywords from a text. It is based on the TF-IDF algorithm for extracting keywords. Beside, it also uses the YAKE and RapidFuzz libraries.

Fast and easy to use. But it still works better than the other libraries I've tried. I think so (●´ω`●).

If you have any complain, my mail here (>‘o’)>: trinhhungsss492@gmail.com. Or my facebook here (。◕‿◕。): https://www.facebook.com/trinhdoduyhungss.

Installation

Download and install through pip with wheel support:

pip install kwextractor

Usage

from kwextractor.process.extract_keywords import ExtractKeywords
from kwextractor.process.extract_numverse import ExtractNumverse
from kwextractor.process.replacing_w2n import ReplacingWtoN
keywords = ExtractKeywords().extract_keywords("tôi thích nghe các bản nhạc của Trịnh Công Sơn")
print(keywords) # "bản nhạc,Trịnh Công Sơn"
num_verse = ExtractNumverse().extract_numverse("sinh cho tui bài thơ gồm hai chục câu nhé",20) #20 is the maximum value returned. It can be any integer number, you can set it fit your need.
print(num_verse) # 20
replacing_w2n = ReplacingWtoN().replacing_w2n("cho hỏi làm sao để sinh ra mười bài thơ")
print(replacing_w2n) # "cho hỏi làm sao để sinh ra 10 bài thơ"
keywords = ExtractKeywords().extract_keywords("Tổng thống Mỹ Donald Trump đã đề nghị các nước thành viên NATO tăng cường đầu tư trong lĩnh vực an ninh, đặc biệt là trong lĩnh vực phòng chống tấn công từ các quốc gia có thể xâm nhập vào các thành phố của các nước thành viên. Đây là lần đầu tiên tổng thống Mỹ đề nghị các nước thành viên NATO tăng cường đầu tư trong lĩnh vực an ninh.")
print(keywords) # "cường đầu,quốc gia,xâm nhập,ninh đặc,an ninh,Donald Trump,Tổng thống lĩnh vực phòng chống tấn công,NATO"

🤘 Version v0.0.3: Customize is available now🤘

Customize

from kwextractor.process.extract_keywords import ExtractKeywords
text = "tôi thích nghe các bản nhạc của Trịnh Công Sơn"
fake_data = {
    "author": [
        "Trịnh Thăng Bình",
        "Lê Bảo Bình",
        "Phan Mạnh Quỳnh",
        "Karik",
        "Ngô Kiến Huy",
        "Chí Tâm",
        "Trang Yue",
        "B Ray",
        "ERIK",
        "Emcee L (Da LAB)",
        "Badbies",
        "Vũ",
        "Sơn Tùng M-TP"
    ]
}
kw = ExtractKeywords(lan='vi', data_keywords=fake_data, return_group=True) # all parameters: data_keywords, lan, ngram, stop_words
print(kw.extract_keywords(text)) #{'author': ['bản nhạc', 'Trịnh Công Sơn']}

Features

Feature Description Available at version
🍎 Extract keywords from a sentence Extract keywords from a sentence. If the sentence has more than one keyword, the keywords will be separated by a comma. And empty if the sentence has no keyword. ✅ v0.0.1 ⇪
🍎 Extract keywords from a paragraph Extract keywords from a paragraph and return a list of keywords ✅ v0.0.2 ⇪
🍎 Extract num-string from a sentence Extract num-string (number as text) from a sentence. Only return 1 number as a integer in a sentence. ✅ v0.0.1 ⇪
🍎 Replace num-string with a number Replace num-string with a number in the sentence. ✅ v0.0.1 ⇪

Development

Install dependencies

pip install -r requirements.txt

Build

python setup.py bdist_wheel

Test

pytest

Any question? (ு८ு)

_/﹋\_
(҂`_´)
<,︻╦╤─ ҉ – – 🍎
_/﹋\_

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kwextractor-0.0.8.tar.gz (81.7 kB view details)

Uploaded Source

Built Distribution

kwextractor-0.0.8-py3-none-any.whl (81.5 kB view details)

Uploaded Python 3

File details

Details for the file kwextractor-0.0.8.tar.gz.

File metadata

  • Download URL: kwextractor-0.0.8.tar.gz
  • Upload date:
  • Size: 81.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for kwextractor-0.0.8.tar.gz
Algorithm Hash digest
SHA256 7ecf0fc736e664284eee57b18f6749f4b04c51df9d54ba94c82b0ba1f6d1a191
MD5 108c877be3a17952bd13f44335d95abc
BLAKE2b-256 a33bfed130ca6b3751761bbb4b114422304cf2888622f4a64d99164f1b9e3ea0

See more details on using hashes here.

File details

Details for the file kwextractor-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: kwextractor-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 81.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for kwextractor-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 1f10c3b6536cf7bfcf8adaab3fcc384b5ed8b07cd0469a8f2fe04029110506bd
MD5 34216abd136c71f18ef5ce2f8633f9cf
BLAKE2b-256 11a11aa6e0cdd024c862235f24344c08c08a1a988f8ea6ccdfefcf7e6a4008bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page