Skip to main content

An integration of Intersystems Iris with Haystack for datastore and retrievers

Project description

intersystems-iris-haystack

License Haystack Python 3.12 PyPI - Version Documentation Tests

Table of Contents


Overview

An integration of InterSystems IRIS database with Haystack 2.x by deepset. In IRIS, the native VECTOR(DOUBLE, N) type is used for storing document embeddings, and the VECTOR_COSINE function enables high-performance dense retrievals using SIMD operations.

The library allows using InterSystems IRIS as a DocumentStore, implementing the required Protocol methods. You can start working with the implementation by importing it from the package:

from intersystems_iris_haystack.document_stores import IRISDocumentStore

In addition to the IRISDocumentStore, the library includes the following Haystack components which can be used in a pipeline:

  • IRISEmbeddingRetriever - A component used to query the vector store and find semantically related Documents. It uses VECTOR_COSINE natively in the database.

  • IRISBm25Retriever - A keyword-based retriever that implements Okapi BM25 over the stored documents.

The intersystems-iris-haystack library uses the official intersystems-iris Python Driver to interact with the database and hides all SQL complexities under the hood.

                                   +-----------------------------+
                                   |   InterSystems IRIS DB      |
                                   +-----------------------------+
                                   |                             |
                                   |      +----------------+     |
                                   |      |  document_table|     |
                write_documents    |      +----------------+     |
          +------------------------+----->|  id (VARCHAR)  |     |
          |                        |      |  content (CLOB)|     |
+---------+----------+             |      |  meta (JSON)   |     |
|                    |             |      |  embedding     |     |
| IRISDocumentStore  |             |      +--------+-------+     |
|                    |             |               |             |
+---------+----------+             |               |             |
          |                        |               |             |
          |                        |      +--------+--------+    |
          |                        |      | VECTOR_COSINE   |    |
          +----------------------->|      | SIMD execution  |    |
               query_embeddings    |      +-----------------+    |
                                   |                             |
                                   +-----------------------------+

In the above diagram:

  • Documents are stored as rows in a dedicated relational table.
  • Meta properties are stored as natively queryable JSON.
  • embedding is stored as a VECTOR column type.
  • Retrievals are executed by the database engine directly, eliminating the need for an external vector database.

Installation

Install the integration via pip:

pip install intersystems-iris-haystack

Note: For the examples below, you will also need an embedder like sentence-transformers.

Requires: Python 3.10+ (Recommended/Tested on 3.12) and a running InterSystems IRIS instance.

Running InterSystems IRIS

Start IRIS locally with Docker:

docker run -d --name iris -p 1972:1972 -p 52773:52773 \
  intersystemsdc/iris-community:latest

Start an interactive terminal with the following:

docker exec -it my-iris iris session IRIS

Or login to the Mangement Portal at http://localhost:52773/csp/sys/%25CSP.Portal.Home.zen

The default username is _SYSTEM and password is SYS; you will be prompted to change this password after logging in.


Quick start

Create a .env file using .env.example template and import the default config credentials for IntersystemsIris.

IRIS_CONNECTION_STRING="localhost:1972/USER"
IRIS_USERNAME="_system"
IRIS_PASSWORD="SYS"

Example (RAG)

from haystack import Document, Pipeline
from haystack.components.embedders import (
    SentenceTransformersDocumentEmbedder,
    SentenceTransformersTextEmbedder,
)
from haystack.components.writers import DocumentWriter
from haystack.document_stores.types import DuplicatePolicy

from intersystems_iris_haystack.document_stores import IRISDocumentStore
from intersystems_iris_haystack.components.retrievers import (
    IRISEmbeddingRetriever,
    IRISBm25Retriever,
)

MODEL = "sentence-transformers/all-MiniLM-L6-v2"
store = IRISDocumentStore(embedding_dim=384)

# Indexing
indexing = Pipeline()
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder(model=MODEL))
indexing.add_component("writer", DocumentWriter(store, policy=DuplicatePolicy.OVERWRITE))
indexing.connect("embedder.documents", "writer.documents")
indexing.run({"embedder": {"documents": [
    Document(content="IRIS is a multimodel database.", meta={"category": "db"}),
    Document(content="Haystack builds LLM pipelines.",  meta={"category": "ai"}),
]}})

# Semantic search
query_pipeline = Pipeline()
query_pipeline.add_component("embedder", SentenceTransformersTextEmbedder(model=MODEL))
query_pipeline.add_component("retriever", IRISEmbeddingRetriever(store, top_k=3))
query_pipeline.connect("embedder.embedding", "retriever.query_embedding")

result = query_pipeline.run({"embedder": {"text": "what is vector search?"}})

# BM25 keyword search
bm25 = IRISBm25Retriever(store, top_k=3)
result = bm25.run(query="multimodel database")

Documentation

The full documentation is built with MkDocs Material and covers installation, all components, API reference, and a contributor guide.

Serve locally

With hatch (recommended)

# Install hatch if you don't have it
pip install hatch

# Serve docs with live reload at http://127.0.0.1:8000
hatch run docs:serve

With pip

pip install mkdocs-material mkdocstrings[python] \
mkdocs-git-revision-date-localized-plugin \
mkdocs-minify-plugin pymdown-extensions mike

mkdocs serve

Development

Setup

git clone https://github.com/s-c-ai/iris-haystack.git
cd iris-haystack

# Start IRIS and example
cd examples/
docker-compose up -d
hatch run example:run

# Run all tests
hatch run test:all

Test commands

Command Description
hatch run test:unit Unit tests — no IRIS required
hatch run test:integration Integration tests — IRIS must be running
hatch run test:all All tests
hatch run test:cov All tests with coverage report

Code quality

hatch run fmt          # format and fix lint issues
hatch run fmt-check    # check only (used in CI)
hatch run type-check   # mypy

License

Apache 2.0 — see LICENSE.


References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intersystems_iris_haystack-0.1.1.tar.gz (52.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

intersystems_iris_haystack-0.1.1-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file intersystems_iris_haystack-0.1.1.tar.gz.

File metadata

  • Download URL: intersystems_iris_haystack-0.1.1.tar.gz
  • Upload date:
  • Size: 52.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.16.5 cpython/3.13.13 HTTPX/0.28.1

File hashes

Hashes for intersystems_iris_haystack-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2bcb24dd01fe98d2737a1159b2f70cceb32b527d436adaedcc6952b0f32d887a
MD5 8839f495e1256529b4b80fcf00c7e0c5
BLAKE2b-256 f154828cbb774ed312f5f6fb436bebafa8370d70d5434b067a97c2e7cb2dae9b

See more details on using hashes here.

File details

Details for the file intersystems_iris_haystack-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for intersystems_iris_haystack-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 885922ca5549b3ca9f5f9eb80d3b42d85e0d1d52eb67d16c084c1f81d4f8898d
MD5 147600aa9c8bae80b6bf80aaf4eca9a0
BLAKE2b-256 5c8f043efe4607ef8915e7f71332c96ded80990fce5f4aa20bde77b148b8c300

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page