Skip to main content

Various BM25 algorithms for document ranking

Project description

Hybrid Search

Installation

pip install hbsearch

Usage

Example

from hbsearch import hybird_search,hybird_search_top_k

toy_doc_string = (
    '精确模式,试图将句子最精确地切开,适合文本分析;全模式,'
    '把句子中所有的可以成词的词语都扫描出来, 速度非常快,但是不能解决歧义;'
    '搜索引擎模式,在精确模式的基础上,对长词再次切分,提高召回率,适合用于搜索引擎分词。'
    'I am from China, I like math.'
  )

query = '精确模式'

#chunk size for long text
chunk_size=10
# search results
results_and_score, results = hybird_search(query, toy_doc_string, chunk_size)

# print the result list
print(*results_and_score, sep="\n")


# top k search results, 
top_k = 5


# get search results
results_and_score = hybird_search_top_k(query, toy_doc_string,top_k, chunk_size)

# print the result list
print(*results_and_score, sep="\n")

Output

----------using 2*GPUs----------
Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
Loading model cost 0.601 seconds.
Prefix dict has been built successfully.
('精确模式,试图将句子', 0.03278688524590164)
('最精确地切开,适合文', 0.03200204813108039)
('本分析;全模式,把句', 0.03149801587301587)
('决歧义;搜索引擎模式', 0.031009615384615385)
(',在精确模式的基础上', 0.030834914611005692)
('词语都扫描出来, 速', 0.030309988518943745)
('子中所有的可以成词的', 0.03007688828584351)
('度非常快,但是不能解', 0.029857397504456328)
(',对长词再次切分,提', 0.028985507246376812)
('高召回率,适合用于搜', 0.02857142857142857)
('索引擎分词。I am', 0.028169014084507043)
(' from China', 0.027777777777777776)
(', I like math', 0.0273972602739726)
('.', 0.02702702702702703)
----------using 2*GPUs----------
('精确模式,试图将句子', 0.03278688524590164)
('最精确地切开,适合文', 0.031024531024531024)
('高召回率,适合用于搜', 0.0304147465437788)
('本分析;全模式,把句', 0.030330882352941176)
(',在精确模式的基础上', 0.03021353930031804)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hbsearch-0.1.2-py3-none-any.whl (4.0 kB view details)

Uploaded Python 3

File details

Details for the file hbsearch-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: hbsearch-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 4.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.16

File hashes

Hashes for hbsearch-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b4e5f45453d9ca3258d2c581dc6d29a95f147013e932d3400cee38a2b0db855a
MD5 5b2e1fb97b438a20481166d3598c7640
BLAKE2b-256 d26bd99e7257ed298ecb9c16282bb94c2734b904333a5f42f231c46786e9d74d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page