Skip to main content

Declarative Document Indexing (DDI) Schemas for RAG — LLM-powered pre-indexing and hybrid retrieval.

Project description

Ennoia

CI coverage PyPI Python License types: pyright strict Ruff

Ennoia introduces Declarative Document Indexing Schemas (DDI Schemas) for RAG — a new pre-indexing approach where LLM-powered extraction is defined through schemas and executed before documents enter any store, replacing naive chunk-and-embed with structured, queryable indices.

Traditional RAG is like feeding your documents through a shredder and then trying to answer questions by pulling out strips of paper one by one.

Ennoia is like reading each document first, taking structured notes on what matters, and then searching your notes — while keeping the originals on the shelf.

Install

pip install "ennoia[ollama,sentence-transformers,cli]"

Available extras: ollama, openai, anthropic, sentence-transformers, filesystem (Parquet + NumPy stores), cli (ennoia CLI), all (everything above).

Quick start (SDK)

from datetime import date
from typing import Literal

from ennoia import BaseSemantic, BaseStructure, Pipeline, Store
from ennoia.adapters.embedding.sentence_transformers import SentenceTransformerEmbedding
from ennoia.adapters.llm.ollama import OllamaAdapter
from ennoia.store import InMemoryStructuredStore, InMemoryVectorStore


# DDI Schema #1 — structured extraction. Field types drive filter
# operators automatically (Literal → eq/in, date → range ops); the
# docstring is the LLM prompt.
class DocMeta(BaseStructure):
    """Extract basic document metadata."""

    category: Literal["legal", "medical", "financial"]
    doc_date: date


# DDI Schema #2 — semantic extraction. The docstring is the question the
# LLM answers; the answer is embedded for vector search.
class Summary(BaseSemantic):
    """What is the main topic of this document?"""


# Configure the pipeline: schemas + a two-phase store (structured filter
# → vector search) + LLM and embedding adapters.
pipeline = Pipeline(
    schemas=[DocMeta, Summary],
    store=Store(vector=InMemoryVectorStore(), structured=InMemoryStructuredStore()),
    llm=OllamaAdapter(model="qwen3:0.6b"),
    embedding=SentenceTransformerEmbedding(model="all-MiniLM-L6-v2"),
)

# Pre-indexing: every schema runs against the document once, before writing
# structured fields to the structured store and embedded answers to the
# vector store — before any query touches them.
pipeline.index(text="The court held that...", source_id="doc_001")

# Hybrid search: `filters` narrows candidates via the structured store,
# then vector similarity ranks within that subset.
results = pipeline.search(
    query="court holdings on liability",
    filters={"category": "legal"},
    top_k=5,
)

See docs/quickstart.md for the full walkthrough.

Quick start (CLI)

# Iterate on a schema against a single document
ennoia try ./sample.txt --schema my_schemas.py

# Index a folder into a filesystem-backed store
ennoia index ./docs \
  --schema my_schemas.py \
  --store ./my_index \
  --llm ollama:qwen3:0.6b \
  --embedding sentence-transformers:all-MiniLM-L6-v2

# Hybrid search
ennoia search "employer duty to accommodate disability" \
  --schema my_schemas.py \
  --store ./my_index \
  --filter "jurisdiction=WA" \
  --filter "date_decided__gte=2020-01-01" \
  --top-k 5

See docs/cli.md.

Documentation

License

Apache 2.0. See LICENSE.txt and NOTICE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ennoia-0.2.0.tar.gz (186.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ennoia-0.2.0-py3-none-any.whl (59.3 kB view details)

Uploaded Python 3

File details

Details for the file ennoia-0.2.0.tar.gz.

File metadata

  • Download URL: ennoia-0.2.0.tar.gz
  • Upload date:
  • Size: 186.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ennoia-0.2.0.tar.gz
Algorithm Hash digest
SHA256 49d0d7f2e1cf1c435c5566c33d43604dcb953abf370b1c970dbe29311beb3bc9
MD5 3a2291a12c905b78b92db164fb581e0d
BLAKE2b-256 af5f07228abe1ac819ffaec7b6cadae62a8468c23a2c2d3f79199d682d286634

See more details on using hashes here.

Provenance

The following attestation bundles were made for ennoia-0.2.0.tar.gz:

Publisher: release.yml on vunone/ennoia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ennoia-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ennoia-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 59.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ennoia-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1680ae7dd2586db045b3c0c07b6a2aec9fa1a90feff3a21251608b247fbc7057
MD5 38c3021c67c13ad57f5eee29d70b8017
BLAKE2b-256 bdd24dab9420c7b4a18e194065262f3cd3d940ebcdaf5f0b67f73eaf50a862d5

See more details on using hashes here.

Provenance

The following attestation bundles were made for ennoia-0.2.0-py3-none-any.whl:

Publisher: release.yml on vunone/ennoia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page