No project description provided
Project description
自然语言处理工具包
from duowen_huqie import NLP
nlp = NLP()
text = "Apache Spark 是一个用于大规模数据处理的统一分析引擎。它提供了 Java、Scala、Python 和 R 的高级 API,以及支持通用执行图的优化引擎。它还支持包括 Spark SQL 用于 SQL 和结构化数据处理、Spark 上的 pandas API 用于 pandas 工作负载、MLlib 用于机器学习、GraphX 用于图处理以及 Structured Streaming 用于增量计算和流处理的丰富高级工具集。"
# 粗切
print(nlp.content_cut(text))
# 细切
print(nlp.content_sm_cut(text))
# 新增词条
nlp.tok_add_word("分析引擎", 1000, "nr")
# 删除词条
nlp.tok_del_word("分析引擎")
# 更新词条
nlp.tok_update_word("分析引擎", 1000, "n")
# 词条查询权重
print(nlp.term_weight("大数据平台使用的什么数据引擎"))
query = "什么是混合召回?"
documents = ["混合召回是一种结合文本召回和向量召回的方法。",
"文本召回通过关键词匹配实现,向量召回通过语义相似度实现。",
"混合召回可以提高搜索的准确性和覆盖率。", ]
query_vector = [...] # 向量需要外部计算
docs_vector = [[...], [...], [...]] # 向量需要外部计算
# 文本相似度
print(nlp.text_similarity(question=query, docs=documents))
# 问句文本相似度(去除停词)
print(nlp.query_text_similarity(question=query, docs=documents))
# 混合相似度
print(nlp.hybrid_similarity(question=query, question_vector=query_vector, docs_vector=docs_vector, docs=documents))
# 问句混合相似度(去除停词)
print(
nlp.query_hybrid_similarity(question=query, question_vector=query_vector, docs_vector=docs_vector, docs=documents))
# 向量相似度
print(nlp.vector_similarity(question_vector=query_vector, docs_vector=docs_vector))
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
duowen_huqie-0.1.8.tar.gz
(29.5 MB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file duowen_huqie-0.1.8.tar.gz.
File metadata
- Download URL: duowen_huqie-0.1.8.tar.gz
- Upload date:
- Size: 29.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85980f9903585cc73cbb75f1c9fb88b6a7e7941f337aa4ec39116e9039d714d2
|
|
| MD5 |
835261e80d8f242c6e8c70e402006934
|
|
| BLAKE2b-256 |
3e71fba2aa034990f75dca1ddf69b7dd3742b486525fadcfce919884fdef36a2
|
File details
Details for the file duowen_huqie-0.1.8-py3-none-any.whl.
File metadata
- Download URL: duowen_huqie-0.1.8-py3-none-any.whl
- Upload date:
- Size: 29.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8957c8c50d0b3566a248d6426a15dcc0673807d493d38d3c0c7de31f8aabb38
|
|
| MD5 |
2bd4d1b38678e1ddf1d8dc95dbb3a7dc
|
|
| BLAKE2b-256 |
fa701e248912be4d1394675b44a853d81248d828252874eb86c7bd533a997bd3
|