Skip to main content

Python functions for working with SQLite FTS4 search

Project description

sqlite-fts4

PyPI Changelog Tests License

Custom SQLite functions written in Python for ranking documents indexed using the FTS4 extension.

Read Exploring search relevance algorithms with SQLite for further details on this project.

Demo

You can try out these SQL functions using this interactive demo.

Installation

pip install sqlite-fts4

Usage

This module implements several custom SQLite3 functions. You can register them against an existing SQLite connection like so:

import sqlite3
from sqlite_fts4 import register_functions

conn = sqlite3.connect(":memory:")
register_functions(conn)

If you only want a subset of the functions registered you can do so like this:

from sqlite_fts4 import rank_score

conn = sqlite3.connect(":memory:")
conn.create_function("rank_score", 1, rank_score)

if you want to use these functions with Datasette you can enable them by installing the datasette-sqlite-fts4 plugin:

pip install datasette-sqlite-fts4

rank_score()

This is an extremely simple ranking function, based on an example in the SQLite documentation. It generates a score for each document using the sum of the score for each column. The score for each column is calculated as the number of search matches in that column divided by the number of search matches for every column in the index - a classic TF-IDF calculation.

You can use it in a query like this:

select *, rank_score(matchinfo(docs, "pcx")) as score
from docs where docs match "dog"
order by score desc

You must use the "pcx" matchinfo format string here, or you will get incorrect results.

rank_bm25()

An implementation of the Okapi BM25 scoring algorithm. Use it in a query like this:

select *, rank_bm25(matchinfo(docs, "pcnalx")) as score
from docs where docs match "dog"
order by score desc

You must use the "pcnalx" matchinfo format string here, or you will get incorrect results. If you see any math domain errors in your logs it may be because you did not use exactly the right format string here.

decode_matchinfo()

SQLite's built-in matchinfo() function returns results as a binary string. This binary represents a list of 32 bit unsigned integers, but reading the binary results is not particularly human-friendly.

The decode_matchinfo() function decodes the binary string and converts it into a JSON list of integers.

Usage:

select *, decode_matchinfo(matchinfo(docs, "pcx"))
from docs where docs match "dog"

Example output:

hello dog, [1, 1, 1, 1, 1]

annotate_matchinfo()

This function decodes the matchinfo document into a verbose JSON structure that describes exactly what each of the returned integers actually means.

Full documentation for the different format string options can be found here: https://www.sqlite.org/fts3.html#matchinfo

You need to call this function with the same format string as was passed to matchinfo() - for example:

select annotate_matchinfo(matchinfo(docs, "pcxnal"), "pcxnal")
from docs where docs match "dog"

The returned JSON will include a key for each letter in the format string. For example:

{
    "p": {
        "value": 1,
        "title": "Number of matchable phrases in the query"
    },
    "c": {
        "value": 1,
        "title": "Number of user defined columns in the FTS table"
    },
    "x": {
        "value": [
            {
                "column_index": 0,
                "phrase_index": 0,
                "hits_this_column_this_row": 1,
                "hits_this_column_all_rows": 2,
                "docs_with_hits": 2
            }
        ],
        "title": "Details for each phrase/column combination"
    },
    "n": {
        "value": 3,
        "title": "Number of rows in the FTS4 table"
    },
    "a": {
        "title":"Average number of tokens in the text values stored in each column",
        "value": [
            {
                "column_index": 0,
                "average_num_tokens": 2
            }
        ]
    },
    "l": {
        "title": "Length of value stored in current row of the FTS4 table in tokens for each column",
        "value": [
            {
                "column_index": 0,
                "length_of_value": 2
            }
        ]
    }
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlite-fts4-1.0.1.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

sqlite_fts4-1.0.1-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file sqlite-fts4-1.0.1.tar.gz.

File metadata

  • Download URL: sqlite-fts4-1.0.1.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for sqlite-fts4-1.0.1.tar.gz
Algorithm Hash digest
SHA256 b2d4f536a28181dc4ced293b602282dd982cc04f506cf3fc491d18b824c2f613
MD5 c2d46f2201fb2c7080c0993f22d39ef2
BLAKE2b-256 623063e64b7b8fa69aabf97b14cbc204cb9525eb2132545f82231c04a6d40d5c

See more details on using hashes here.

File details

Details for the file sqlite_fts4-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: sqlite_fts4-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for sqlite_fts4-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a41b3e510df8bda50e9b1a5162a1e8ff3b92c62a1576ff5c74aaae2cedc9d4f5
MD5 f7696a44b735b36ff5ad4db5413a7822
BLAKE2b-256 dc135637f1dcbb727ba40d0126b5ac21740d54b585d45a25983977a404194852

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page