An integration of Intersystems Iris with Haystack for datastore and retrievers
Project description
intersystems-iris-haystack
Table of Contents
Overview
An integration of InterSystems IRIS database with Haystack 2.x by deepset. In IRIS, the native VECTOR(DOUBLE, N) type is used for storing document embeddings, and the VECTOR_COSINE function enables high-performance dense retrievals using SIMD operations.
The library allows using InterSystems IRIS as a DocumentStore, implementing the required Protocol methods. You can start working with the implementation by importing it from the package:
from intersystems_iris_haystack.document_stores import IRISDocumentStore
In addition to the IRISDocumentStore, the library includes the following Haystack components which can be used in a pipeline:
-
IRISEmbeddingRetriever - A component used to query the vector store and find semantically related Documents. It uses VECTOR_COSINE natively in the database.
-
IRISBm25Retriever - A keyword-based retriever that implements Okapi BM25 over the stored documents.
The intersystems-iris-haystack library uses the official intersystems-iris Python Driver to interact with the database and hides all SQL complexities under the hood.
+-----------------------------+
| InterSystems IRIS DB |
+-----------------------------+
| |
| +----------------+ |
| | document_table| |
write_documents | +----------------+ |
+------------------------+----->| id (VARCHAR) | |
| | | content (CLOB)| |
+---------+----------+ | | meta (JSON) | |
| | | | embedding | |
| IRISDocumentStore | | +--------+-------+ |
| | | | |
+---------+----------+ | | |
| | | |
| | +--------+--------+ |
| | | VECTOR_COSINE | |
+----------------------->| | SIMD execution | |
query_embeddings | +-----------------+ |
| |
+-----------------------------+
In the above diagram:
- Documents are stored as rows in a dedicated relational table.
- Meta properties are stored as natively queryable JSON.
- embedding is stored as a VECTOR column type.
- Retrievals are executed by the database engine directly, eliminating the need for an external vector database.
Installation
Install the integration via pip:
pip install intersystems-iris-haystack
Note: For the examples below, you will also need an embedder like sentence-transformers.
Requires: Python 3.10+ (Recommended/Tested on 3.12) and a running InterSystems IRIS instance.
Running InterSystems IRIS
Start IRIS locally with Docker:
docker run -d --name iris -p 1972:1972 -p 52773:52773 \
intersystemsdc/iris-community:latest
Start an interactive terminal with the following:
docker exec -it my-iris iris session IRIS
Or login to the Mangement Portal at http://localhost:52773/csp/sys/%25CSP.Portal.Home.zen
The default username is _SYSTEM and password is SYS; you will be prompted to change this password after logging in.
Quick start
Create a .env file using .env.example template and import the default config credentials for IntersystemsIris.
IRIS_CONNECTION_STRING="localhost:1972/USER"
IRIS_USERNAME="_system"
IRIS_PASSWORD="SYS"
Example (RAG)
from haystack import Document, Pipeline
from haystack.components.embedders import (
SentenceTransformersDocumentEmbedder,
SentenceTransformersTextEmbedder,
)
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy
from intersystems_iris_haystack.document_stores import IRISDocumentStore
from intersystems_iris_haystack.components.retrievers import (
IRISEmbeddingRetriever,
IRISBm25Retriever,
)
MODEL = "sentence-transformers/all-MiniLM-L6-v2"
store = IRISDocumentStore(embedding_dim=384)
# Indexing
indexing = Pipeline()
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder(model=MODEL))
indexing.add_component("writer", DocumentWriter(store, policy=DuplicatePolicy.OVERWRITE))
indexing.connect("embedder.documents", "writer.documents")
indexing.run({"embedder": {"documents": [
Document(content="IRIS is a multimodel database.", meta={"category": "db"}),
Document(content="Haystack builds LLM pipelines.", meta={"category": "ai"}),
]}})
# Semantic search
query_pipeline = Pipeline()
query_pipeline.add_component("embedder", SentenceTransformersTextEmbedder(model=MODEL))
query_pipeline.add_component("retriever", IRISEmbeddingRetriever(store, top_k=3))
query_pipeline.connect("embedder.embedding", "retriever.query_embedding")
result = query_pipeline.run({"embedder": {"text": "what is vector search?"}})
# BM25 keyword search
bm25 = IRISBm25Retriever(store, top_k=3)
result = bm25.run(query="multimodel database")
Documentation
The full documentation is built with MkDocs Material and covers installation, all components, API reference, and a contributor guide.
Serve locally
With hatch (recommended)
# Install hatch if you don't have it
pip install hatch
# Serve docs with live reload at http://127.0.0.1:8000
hatch run docs:serve
With pip
pip install mkdocs-material mkdocstrings[python] \
mkdocs-git-revision-date-localized-plugin \
mkdocs-minify-plugin pymdown-extensions mike
mkdocs serve
Development
Setup
git clone https://github.com/s-c-ai/iris-haystack.git
cd iris-haystack
# Start IRIS and example
cd examples/
docker-compose up -d
hatch run example:run
# Run all tests
hatch run test:all
Test commands
| Command | Description |
|---|---|
hatch run test:unit |
Unit tests — no IRIS required |
hatch run test:integration |
Integration tests — IRIS must be running |
hatch run test:all |
All tests |
hatch run test:cov |
All tests with coverage report |
Code quality
hatch run fmt # format and fix lint issues
hatch run fmt-check # check only (used in CI)
hatch run type-check # mypy
License
Apache 2.0 — see LICENSE.
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file intersystems_iris_haystack-0.1.1.tar.gz.
File metadata
- Download URL: intersystems_iris_haystack-0.1.1.tar.gz
- Upload date:
- Size: 52.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.5 cpython/3.13.13 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bcb24dd01fe98d2737a1159b2f70cceb32b527d436adaedcc6952b0f32d887a
|
|
| MD5 |
8839f495e1256529b4b80fcf00c7e0c5
|
|
| BLAKE2b-256 |
f154828cbb774ed312f5f6fb436bebafa8370d70d5434b067a97c2e7cb2dae9b
|
File details
Details for the file intersystems_iris_haystack-0.1.1-py3-none-any.whl.
File metadata
- Download URL: intersystems_iris_haystack-0.1.1-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.5 cpython/3.13.13 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
885922ca5549b3ca9f5f9eb80d3b42d85e0d1d52eb67d16c084c1f81d4f8898d
|
|
| MD5 |
147600aa9c8bae80b6bf80aaf4eca9a0
|
|
| BLAKE2b-256 |
5c8f043efe4607ef8915e7f71332c96ded80990fce5f4aa20bde77b148b8c300
|