LanceDB backed datastore and retrievers for Haystack 2.X
Project description
LanceDB Haystack Document store
LanceDB-Haystack is an embedded LanceDB backed Document Store for Haystack 2.X.
Installation
The current simplest way to get LanceDB-Haystack is to install from GitHub via pip:
pip install lancedb-haystack
Usage
import pyarrow as pa
from lancedb_haystack import LanceDBDocumentStore
from lancedb_haystack import LanceDBEmbeddingRetriever, LanceDBFTSRetriever
# Declare the metadata fields schema, this lets us filter using it.
# See: https://arrow.apache.org/docs/python/api/datatypes.html
metadata_schema = pa.struct([
('title', pa.string()),
('publication_date', pa.timestamp('s')),
('page_number', pa.int32()),
('topics', pa.list_(pa.string()))
])
# Create the DocumentStore
document_store = LanceDBDocumentStore(
database='my_database',
table_name="documents",
metadata_schema=metadata_schema,
embedding_dims=384
)
# Create an embedding retriever
embedding_retriever = LanceDBEmbeddingRetriever(document_store)
# Create a Full Text Search retriever
fts_retriever = LanceDBFTSRetriever(document_store)
See also examples/pipeline-usage.ipynb
for a full worked example.
Development
Test
You can use hatch
to run the linters:
~$ hatch run lint:all
cmd [1] | ruff .
cmd [2] | black --check --diff .
All done! ✨ 🍰 ✨
6 files would be left unchanged.
cmd [3] | mypy --install-types --non-interactive src/lancedb_haystack tests
Success: no issues found in 6 source files
Similar for running the tests:
~$ hatch run cov
cmd [1] | coverage run -m pytest tests
...
Build
To build the package you can use hatch
:
~$ hatch build
[sdist]
dist/lancedb_haystack-0.1.0.tar.gz
[wheel]
dist/lancedb_haystack-0.1.0-py3-none-any.whl
Document
To build the api docs run the following:
~$ cd docs
~$ make clean
~$ make build
Roadmap
In no particular order:
-
Figure out if it's possible to have LanceDB work with dynamic metadata
Currently, this implementation is limited to having only metadata which is defined in the metadata_schema. It would be nice to be able to infer a schema from the first document to be added, or even better, be able to just have arbitrary metadata, rather than having to specify it all up front.
-
Expand the supported metadata types
As noted the metadata section requires a pyarrow schema; not all of the types have been tested, and may not all be supported. It would be good to try out a few more to see if they're supported, and perhaps add those that aren't.
Limitations
The DocumentStore requires a pyarrow StructType to be specified as the schema for the metadata dict. This should cover all metadata fields which may appear in any of the documents you want to store.
Currently, the system supports the basic datatypes (ints, floats, bools, strings, etc.) as well as structs and lists.
Others may work, but haven't been tested.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lancedb_haystack-0.1.1.tar.gz
.
File metadata
- Download URL: lancedb_haystack-0.1.1.tar.gz
- Upload date:
- Size: 27.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 59b1e74f7c3ba9960d7b1a3a78c6f4254a58137a0d7c303483843454dc56a13b |
|
MD5 | d57e65f1da05ef481d9037b0939b2e5f |
|
BLAKE2b-256 | c6f9dfb111419e8978e989a13b4280d67729ef838b44b8b8bfdf4428e9dbe7fb |
File details
Details for the file lancedb_haystack-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: lancedb_haystack-0.1.1-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | edb06de4cf8ecaa86f75f332a0b992dc2e6a97a3e063539c1f8ed887c0add9d3 |
|
MD5 | 2ebd99e01dcaf5347b1640cc83b0f7b1 |
|
BLAKE2b-256 | 032765c1cbc61935163531790d67488bbd7f38c5395f840ae32f2ea38ad1466e |