No project description provided
Project description
自然语言处理工具包
from duowen_huqie import NLP
nlp = NLP()
text = "Apache Spark 是一个用于大规模数据处理的统一分析引擎。它提供了 Java、Scala、Python 和 R 的高级 API,以及支持通用执行图的优化引擎。它还支持包括 Spark SQL 用于 SQL 和结构化数据处理、Spark 上的 pandas API 用于 pandas 工作负载、MLlib 用于机器学习、GraphX 用于图处理以及 Structured Streaming 用于增量计算和流处理的丰富高级工具集。"
# 粗切
print(nlp.content_cut(text))
# 细切
print(nlp.content_sm_cut(text))
# 新增词条
nlp.tok_add_word("分析引擎", 1000, "nr")
# 删除词条
nlp.tok_del_word("分析引擎")
# 更新词条
nlp.tok_update_word("分析引擎", 1000, "n")
# 查询词性
print(nlp.tok_tag_word("数据"))
# 词条查询权重
print(nlp.term_weight("大数据平台使用的什么数据引擎"))
query = "什么是混合召回?"
documents = ["混合召回是一种结合文本召回和向量召回的方法。",
"文本召回通过关键词匹配实现,向量召回通过语义相似度实现。",
"混合召回可以提高搜索的准确性和覆盖率。", ]
query_vector = [...] # 向量需要外部计算
docs_vector = [[...], [...], [...]] # 向量需要外部计算
# 文本相似度
print(nlp.text_similarity(question=query, docs=documents))
# 问句文本相似度(去除停词)
print(nlp.query_text_similarity(question=query, docs=documents))
# 混合相似度
print(nlp.hybrid_similarity(question=query, question_vector=query_vector, docs_vector=docs_vector, docs=documents))
# 问句混合相似度(去除停词)
print(
nlp.query_hybrid_similarity(question=query, question_vector=query_vector, docs_vector=docs_vector, docs=documents))
# 向量相似度
print(nlp.vector_similarity(question_vector=query_vector, docs_vector=docs_vector))
# 新词发现
from duowen_huqie.new_word_detection import NewWordDetection
nw = NewWordDetection(nlp)
result, new_word = nw.find_word('高祖,沛豐邑中陽裏人也,姓劉氏。母媼嘗息大澤之陂,夢與神遇。是時雷電晦冥 ,父太公往視,則見交龍於上。已而有娠,遂產高祖。高祖為人,隆准而龍顏,美須髯,左股有七十二黑子。寬仁愛人,意豁如也。常有 大度,不事家人生產作業。及壯,試吏,為泗上亭長,延中吏無所不狎侮。好酒及色。 常從王媼、武負貰酒,時飲醉臥,武負、王媼見其上常有怪。', 3, 5)
for k, v in new_word.items():
print(k, v)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
duowen_huqie-0.1.11.tar.gz
(4.2 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file duowen_huqie-0.1.11.tar.gz.
File metadata
- Download URL: duowen_huqie-0.1.11.tar.gz
- Upload date:
- Size: 4.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b545e0a9bc1f4f3addf96a1d79784cd48e1abcbc8eeab9527ee96d00d482557
|
|
| MD5 |
f2da7e8808f25eaf01b0c9e3e27a4f45
|
|
| BLAKE2b-256 |
0122506c1a4e4d686ff6478b7b4ad3de8c75091c249cac33f07bc66e8cd281f9
|
File details
Details for the file duowen_huqie-0.1.11-py3-none-any.whl.
File metadata
- Download URL: duowen_huqie-0.1.11-py3-none-any.whl
- Upload date:
- Size: 4.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c26fc6627df1016ce645e1aecbf4a0e310d47e9a43fc3bfaaa2b3adc60d5678
|
|
| MD5 |
f1865b3b2f4dd71ea4827b1409ca1868
|
|
| BLAKE2b-256 |
9f68e31c3ae6e7eb89ad8010defeef8bc061825cdf2135564c8968fa24557b79
|