No project description provided

Project description

tantivy-py

Python bindings for Tantivy the full-text search engine library written in Rust.

Installation

The bindings can be installed using from pypi using pip:

pip install tantivy

If no binary wheel is present for your operating system the bindings will be build from source, this means that Rust needs to be installed before building can succeed.

Note that the bindings are using PyO3, which only supports python3.

Development

Setting up a development environment can be done in a virtual environment using nox or using local packages using the provided Makefile.

For the nox setup install the virtual environment and build the bindings using:

python3 -m pip install nox
nox

For the Makefile based setup run:

make

Running the tests is done using:

make test

Usage

The Python bindings have a similar API to Tantivy. To create a index first a schema needs to be built. After that documents can be added to the index and a reader can be created to search the index.

Building an index and populating it

import tantivy

# Declaring our schema.
schema_builder = tantivy.SchemaBuilder()
schema_builder.add_text_field("title", stored=True)
schema_builder.add_text_field("body", stored=True)
schema_builder.add_integer_field("doc_id",stored=True)
schema = schema_builder.build()

# Creating our index (in memory)
index = tantivy.Index(schema)

To have a persistent index, use the path parameter to store the index on the disk, e.g:

index = tantivy.Index(schema, path=os.getcwd() + '/index')

By default, tantivy offers the following tokenizers which can be used in tantivy-py:

default default is the tokenizer that will be used if you do not assign a specific tokenizer to your text field. It will chop your text on punctuation and whitespaces, removes tokens that are longer than 40 chars, and lowercase your text.
raw Does not actual tokenizer your text. It keeps it entirely unprocessed. It can be useful to index uuids, or urls for instance.
en_stem

In addition to what default does, the en_stem tokenizer also apply stemming to your tokens. Stemming consists in trimming words to remove their inflection. This tokenizer is slower than the default one, but is recommended to improve recall.

to use the above tokenizers, simply provide them as a parameter to add_text_field. e.g.

schema_builder.add_text_field("body",  stored=True,  tokenizer_name='en_stem')

Adding one document.

writer = index.writer()
writer.add_document(tantivy.Document(
	doc_id=1,
    title=["The Old Man and the Sea"],
    body=["""He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish."""],
))
# ... and committing
writer.commit()

Building and Executing Queries

First you need to get a searcher for the index

# Reload the index to ensure it points to the last commit.
index.reload()
searcher = index.searcher()

Then you need to get a valid query object by parsing your query on the index.

query = index.parse_query("fish days", ["title", "body"])
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
best_doc = searcher.doc(best_doc_address)
assert best_doc["title"] == ["The Old Man and the Sea"]
print(best_doc)

Valid Query Formats

tantivy-py supports the query language used in tantivy. Some basic query Formats.

AND and OR conjunctions.

query = index.parse_query('(Old AND Man) OR Stream', ["title", "body"])
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
best_doc = searcher.doc(best_doc_address)

+(includes) and -(excludes) operators.

query = index.parse_query('+Old +Man chef -fished', ["title", "body"])
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
best_doc = searcher.doc(best_doc_address)

Note: in a query like above, a word with no +/- acts like an OR.

phrase search.

query = index.parse_query('"eighty-four days"', ["title", "body"])
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
best_doc = searcher.doc(best_doc_address)

integer search

query = index.parse_query('"eighty-four days"', ["doc_id"])
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
best_doc = searcher.doc(best_doc_address)

Note: for integer search, the integer field should be indexed.

For more possible query formats and possible query options, see Tantivy Query Parser Docs.

Project details

Release history Release notifications | RSS feed

0.25.1

Dec 2, 2025

0.25.0

Sep 9, 2025

0.24.0

May 6, 2025

0.22.2

Mar 20, 2025

0.22.0

May 5, 2024

0.21.0

Nov 21, 2023

This version

0.20.1

Sep 11, 2023

0.13.2

Oct 11, 2020

0.13.1-rc.1 pre-release

Sep 20, 2020

0.12.0-rc.2 pre-release

Apr 22, 2020

0.12.0-rc.1 pre-release

Apr 19, 2020

0.11.0-rc.8 pre-release

Jan 25, 2020

0.11.0-rc.7 pre-release

Jan 6, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tantivy-0.20.1.tar.gz (47.9 kB view details)

Uploaded Sep 11, 2023 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tantivy-0.20.1-cp311-none-win_amd64.whl (2.2 MB view details)

Uploaded Sep 11, 2023 CPython 3.11Windows x86-64

tantivy-0.20.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB view details)

Uploaded Sep 11, 2023 CPython 3.11manylinux: glibc 2.17+ x86-64

tantivy-0.20.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.0 MB view details)

Uploaded Sep 11, 2023 CPython 3.11manylinux: glibc 2.17+ ARM64

tantivy-0.20.1-cp311-cp311-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (5.6 MB view details)

Uploaded Sep 11, 2023 CPython 3.11macOS 10.9+ universal2 (ARM64, x86-64)macOS 10.9+ x86-64macOS 11.0+ ARM64

tantivy-0.20.1-cp311-cp311-macosx_10_7_x86_64.whl (2.9 MB view details)

Uploaded Sep 11, 2023 CPython 3.11macOS 10.7+ x86-64

tantivy-0.20.1-cp310-none-win_amd64.whl (2.2 MB view details)

Uploaded Sep 11, 2023 CPython 3.10Windows x86-64

tantivy-0.20.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB view details)

Uploaded Sep 11, 2023 CPython 3.10manylinux: glibc 2.17+ x86-64

tantivy-0.20.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.0 MB view details)

Uploaded Sep 11, 2023 CPython 3.10manylinux: glibc 2.17+ ARM64

tantivy-0.20.1-cp310-cp310-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (5.6 MB view details)

Uploaded Sep 11, 2023 CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)macOS 10.9+ x86-64macOS 11.0+ ARM64

tantivy-0.20.1-cp310-cp310-macosx_10_7_x86_64.whl (2.9 MB view details)

Uploaded Sep 11, 2023 CPython 3.10macOS 10.7+ x86-64

tantivy-0.20.1-cp39-none-win_amd64.whl (2.2 MB view details)

Uploaded Sep 11, 2023 CPython 3.9Windows x86-64

tantivy-0.20.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB view details)

Uploaded Sep 11, 2023 CPython 3.9manylinux: glibc 2.17+ x86-64

tantivy-0.20.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.0 MB view details)

Uploaded Sep 11, 2023 CPython 3.9manylinux: glibc 2.17+ ARM64

tantivy-0.20.1-cp39-cp39-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (5.6 MB view details)

Uploaded Sep 11, 2023 CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)macOS 10.9+ x86-64macOS 11.0+ ARM64

tantivy-0.20.1-cp39-cp39-macosx_10_7_x86_64.whl (2.9 MB view details)

Uploaded Sep 11, 2023 CPython 3.9macOS 10.7+ x86-64

tantivy-0.20.1-cp38-none-win_amd64.whl (2.2 MB view details)

Uploaded Sep 11, 2023 CPython 3.8Windows x86-64

tantivy-0.20.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.1 MB view details)

Uploaded Sep 11, 2023 CPython 3.8manylinux: glibc 2.17+ x86-64

tantivy-0.20.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (4.0 MB view details)

Uploaded Sep 11, 2023 CPython 3.8manylinux: glibc 2.17+ ARM64

tantivy-0.20.1-cp38-cp38-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (5.6 MB view details)

Uploaded Sep 11, 2023 CPython 3.8macOS 10.9+ universal2 (ARM64, x86-64)macOS 10.9+ x86-64macOS 11.0+ ARM64

tantivy-0.20.1-cp38-cp38-macosx_10_7_x86_64.whl (2.9 MB view details)

Uploaded Sep 11, 2023 CPython 3.8macOS 10.7+ x86-64

File details

Details for the file tantivy-0.20.1.tar.gz.

File metadata

Download URL: tantivy-0.20.1.tar.gz
Upload date: Sep 11, 2023
Size: 47.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for tantivy-0.20.1.tar.gz
Algorithm	Hash digest
SHA256	`da1c937494f90d16ecfef00176b8f3b85132dadf35c79ce6115216ee85c8bdf0`
MD5	`e466b6f5061bc46c79597f5be3198716`
BLAKE2b-256	`c5669d12b90ae94570166c235d17a59cd154a446855ead59864fe159ed94a4e8`

Algorithm	Hash digest
SHA256	`1f163e249823dbb1aaeee19cbb56b412ea68e7b32a843e12e9e89060c93e9e89`
MD5	`a373221dc9b1f3d7d6b92422fbd465f0`
BLAKE2b-256	`a71f93b5152b04f48e3f120f0115f0ed56fcdd125d83b27e0d7844e77054511a`

Algorithm	Hash digest
SHA256	`9bde39fd6dbf4ed9c80e4f8af57dcc510a36ddd329db2b9698eb5b272ae139d0`
MD5	`b4d7f1102bb10ab4a5fd1040f86e9320`
BLAKE2b-256	`e4fb8ee1db240d6236824dd9dda29c4693f0640bb8890a5610c7bd8dd9dd2460`

Algorithm	Hash digest
SHA256	`407010e2393205791d55d86088f76c07f5ec35a15440bb81593ff23666b3b935`
MD5	`ee9a3f29a2572a84036753b76ec228e9`
BLAKE2b-256	`2d1d398cef214a6b00d5c6658793991c69722f583dde87fb25da94c65e23aef2`

Algorithm	Hash digest
SHA256	`a689a7a8f3ff54a0a99ff72ad09a89ff1dd63a388370d12882b694b229c6bbe5`
MD5	`9263f44dff2dd7c428db9c9ba6267a22`
BLAKE2b-256	`db8ef15da28e4d617084213ebd5139d9e560839e0c9c36d86f9f6e79731240db`

Algorithm	Hash digest
SHA256	`2e3b77fd49ef2ce53de1fef4115c20eed52614daef32548275049d75e432fada`
MD5	`cde60f4cef7dd9aace96c8c729fdca6e`
BLAKE2b-256	`30a13666ca357009297eedec7d776eadc5006980db957e13178fd33ba0e85449`

Algorithm	Hash digest
SHA256	`f2e368c286d0aa1911fb609a5ff7d4b063c7a636eec4d1c4ee8250326363c9d4`
MD5	`2b9868b291800d759d5ea0c36537d925`
BLAKE2b-256	`fa19675e7d9b5f3697c1cac5e8777bbd5febd2c9504cb7570093b00213f4979e`

Algorithm	Hash digest
SHA256	`82bb9ed9f622964a9f2901093456fb069dcaa81249477931e38bef1eb103327e`
MD5	`d07e5c5137ad336f6942a92452eaf75d`
BLAKE2b-256	`fc471f6391dc39ea71fe2f47ed9b80444f636463642c99b16d9b568077bfebad`

Algorithm	Hash digest
SHA256	`a2a3350bd7354bfccaf00a7741885305194eb10ca004b27e7ba98b631db657b0`
MD5	`33193d58412cd58b73bf781ac29275c4`
BLAKE2b-256	`fc46e8508cbdadfab47f9d0b35004f43cd765ed1df07855cdb1d74ed6accd763`

Algorithm	Hash digest
SHA256	`cac53b5811035cd44ce891a31eb08002b91f94163308df2b3d712e45897d219c`
MD5	`3f373688e5c18c1bd65b6de8db773e50`
BLAKE2b-256	`95414a31d31da986b974d22a53981cfa538b78af252ad201f845a3e1ade43e2d`

Algorithm	Hash digest
SHA256	`0e1d2fff8440685e04b69e098dd6569336de19eb729f21785fb7c9b9421a5e2e`
MD5	`14f4d835b8125aba5264c526bb57152f`
BLAKE2b-256	`c6aacec665ff0a11faa1ce5b502ae2de00e049ef90595bb1da26b9c270afaa39`

Algorithm	Hash digest
SHA256	`7dc3eb52d0d44dbed5f61104fb9b37c0938aae60bf7c3a111c13723377b57fe5`
MD5	`a56ab21bcb00b56d68d34af2e8831e3c`
BLAKE2b-256	`3c6b59d3f8bdc5290d54f1620448fd5bee8aa830367e6c1c03dae80a51250cbe`

Algorithm	Hash digest
SHA256	`3733da8f0012371777b6dcf0310ceddab8391998b7f8d14bbc6eca2effb2d1ea`
MD5	`fbb4adc6424f6483885b7cfb8a232d77`
BLAKE2b-256	`f93677f290259ac89eeed278efec3cbbdd921aef2378d2f769829b25adaefbd8`

Algorithm	Hash digest
SHA256	`2286badd1768f905221f49db565ae6aa230c90d22b59928688dc83688dbd7295`
MD5	`43890926e26e60c14a1050d3dacfaeb6`
BLAKE2b-256	`6b2c15c98239ee7995a9e9feb87c4c4510110b333357229b9279c3ff027e11af`

Algorithm	Hash digest
SHA256	`2fda8125e1da8b0ad621e4da83f0ad1fb6637c246195f0a5e5f896539de2aec2`
MD5	`1bf3712a07e9329d118591980116c313`
BLAKE2b-256	`9efdda4d50d564f194056f083616416008e3831d2db18ad8f69b82b11d4fbc15`

Algorithm	Hash digest
SHA256	`d11234ace32984cf4ce92484de244abaa54d75eae51da12a4c08997f13bc8add`
MD5	`30f68933af1af00f603456cbf8d0ca88`
BLAKE2b-256	`590a19a9384ec23712dc83e539741ebd3e8a4c40b8d097d2d11de655851b1a5a`

Algorithm	Hash digest
SHA256	`94be702ee4a0316249302a83be5b804820e218b237b55ad4d697586ff9ec001f`
MD5	`dafb7f1d6780564b86535f172542c6ce`
BLAKE2b-256	`f6e67569ae1f54216679a7207c0f80993d2f66c0424a2fd319845e98f7f8c8e0`

Algorithm	Hash digest
SHA256	`d3de1d5b92f2dd20d9554202a5a36762cf5d487ee569e7e0eff7bfa62fcf964c`
MD5	`ff2ad9e7371a056cd08ff556be2d461c`
BLAKE2b-256	`227e09696a75e634ad407f58a58985a7f4bad483d4a370277b3223f55224d79f`

Algorithm	Hash digest
SHA256	`17a4ac33c685f1b0ea2581bceb00d97219f9cde27d82a7dff35f988a2e8511e7`
MD5	`39bb5c5710c918b44554fb0e2c34e5cd`
BLAKE2b-256	`1afc3f98d856a77609267a971e13d291ac0bc2b05cabc7645973e552058bab3a`

Algorithm	Hash digest
SHA256	`81fcf4553a4bf15dd2cf5f5b7d79d7cb331433dadc08bfa07e6ea57ebd2b4f95`
MD5	`0027afcf77405ac22fe8589b36d2b90e`
BLAKE2b-256	`f801429633030dddb67ef5743fa6db8a258f6f1a1fe5bdfb118b0cc72f6fe4c0`

Algorithm	Hash digest
SHA256	`a5603254426ea8d5ef203e90bd07dd47f20a0001614af7323bb3edb0a9d2b26f`
MD5	`984d1d6da764844bc7791d714776b2d5`
BLAKE2b-256	`d8719238a4bac49a216af8b8989848936b2b24a9a06a7b1614b09548da4d6bfa`

tantivy 0.20.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

tantivy-py

Installation

Development

Usage

Building an index and populating it

Adding one document.

Building and Executing Queries

Valid Query Formats

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata