Skip to main content

No project description provided

Project description

自然语言处理工具包

from duowen_huqie import NLP

nlp = NLP()

text = "Apache Spark 是一个用于大规模数据处理的统一分析引擎。它提供了 Java、Scala、Python 和 R 的高级 API,以及支持通用执行图的优化引擎。它还支持包括 Spark SQL 用于 SQL 和结构化数据处理、Spark 上的 pandas API 用于 pandas 工作负载、MLlib 用于机器学习、GraphX 用于图处理以及 Structured Streaming 用于增量计算和流处理的丰富高级工具集。"

# 粗切
print(nlp.content_cut(text))

# 细切
print(nlp.content_sm_cut(text))

# 新增词条
nlp.tok_add_word("分析引擎", 1000, "nr")

# 删除词条
nlp.tok_del_word("分析引擎")

# 更新词条
nlp.tok_update_word("分析引擎", 1000, "n")

# 词条查询权重
print(nlp.term_weight("大数据平台使用的什么数据引擎"))

query = "什么是混合召回?"

documents = ["混合召回是一种结合文本召回和向量召回的方法。",
             "文本召回通过关键词匹配实现,向量召回通过语义相似度实现。",
             "混合召回可以提高搜索的准确性和覆盖率。", ]

query_vector = [...]  # 向量需要外部计算
docs_vector = [[...], [...], [...]]  # 向量需要外部计算

# 文本相似度
print(nlp.text_similarity(question=query, docs=documents))

# 问句文本相似度(去除停词)
print(nlp.query_text_similarity(question=query, docs=documents))

# 混合相似度
print(nlp.hybrid_similarity(question=query, question_vector=query_vector, docs_vector=docs_vector, docs=documents))

# 问句混合相似度(去除停词)
print(
    nlp.query_hybrid_similarity(question=query, question_vector=query_vector, docs_vector=docs_vector, docs=documents))

# 向量相似度
print(nlp.vector_similarity(question_vector=query_vector, docs_vector=docs_vector))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duowen_huqie-0.1.8.tar.gz (29.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duowen_huqie-0.1.8-py3-none-any.whl (29.7 MB view details)

Uploaded Python 3

File details

Details for the file duowen_huqie-0.1.8.tar.gz.

File metadata

  • Download URL: duowen_huqie-0.1.8.tar.gz
  • Upload date:
  • Size: 29.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for duowen_huqie-0.1.8.tar.gz
Algorithm Hash digest
SHA256 85980f9903585cc73cbb75f1c9fb88b6a7e7941f337aa4ec39116e9039d714d2
MD5 835261e80d8f242c6e8c70e402006934
BLAKE2b-256 3e71fba2aa034990f75dca1ddf69b7dd3742b486525fadcfce919884fdef36a2

See more details on using hashes here.

File details

Details for the file duowen_huqie-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: duowen_huqie-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 29.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.10

File hashes

Hashes for duowen_huqie-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 f8957c8c50d0b3566a248d6426a15dcc0673807d493d38d3c0c7de31f8aabb38
MD5 2bd4d1b38678e1ddf1d8dc95dbb3a7dc
BLAKE2b-256 fa701e248912be4d1394675b44a853d81248d828252874eb86c7bd533a997bd3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page