Japanese TF-IDF
Project description
Python 日本語のTF-IDF計算機
日本語ドキュメント: White Paper
Features
TF-IDF from Web
Installation
$ pip install simple_tfidf_japanese
Sample Code
# 文章からtfidfを出力(Get TF-IDF from text)
from simple_tfidf_japanese.tfidf import TFIDF
text = "肉フェスNIIGATAで肉三昧の夜ごはん❤︎ステーキハウスあづまさんの雪室熟成新潟県産牛ステーキおいしい*\(^o^)/*お塩でもワサビでもぴったり!"
tfidf1 = TFIDF.gen(text, enable_one_char=1)
for key, value in tfidf1:
print key, value
>>> 肉 0.0952380952381
>>> ステーキ 0.0952380952381
>>> お 0.047619047619
>>> ごはん 0.047619047619
>>> 雪 0.047619047619
>>> 新潟 0.047619047619
>>> 熟成 0.047619047619
...
# Webからtfidfを出力(Get TF-IDF from Web)
url = "https://ja.wikipedia.org/wiki/%E6%B7%A1%E8%B7%AF%E3%83%93%E3%83%BC%E3%83%95"
tfidf2 = TFIDF.gen_web(url)
for key, value in tfidf2:
print key, value
>>> 淡路 0.0453257790368
>>> ビーフ 0.0396600566572
>>> 但馬 0.0198300283286
>>> 淡路島 0.0169971671388
>>> ページ 0.0169971671388
>>> 表示 0.014164305949
# TF-IDF Cosine Similarityで類似度を計算(calc TF-IDF Cosine Similarity)
tfidf1 = [['Apple', 1], ['Orange', 2], ['Banana', 1], ['Kiwi', 0]]
tfidf2 = [['Apple', 1], ['Orange', 0], ['Banana', 2], ['Kiwi', 1]]
print TFIDF.similarity(tfidf1, tfidf2)
>>> 0.5
...
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for simple_tfidf_japanese-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79ddf8eeff336fe7e2167f5322136a1f37ed7407f83a6e93441f24e94b9bb5a7 |
|
MD5 | 9f150ebdaba338ab3a9049b798ea7aa2 |
|
BLAKE2b-256 | 249bf6d83083ded54e654228ca86c93b1f5e6144fd3713c30c902e53aab6de12 |