Declarative Document Indexing (DDI) Schemas for RAG — LLM-powered pre-indexing and hybrid retrieval.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Ennoia

Ennoia introduces Declarative Document Indexing Schemas (DDI Schemas) for RAG — a new pre-indexing approach where LLM-powered extraction is defined through schemas and executed before documents enter any store, replacing naive chunk-and-embed with structured, queryable indices.

Traditional RAG is like feeding your documents through a shredder and then trying to answer questions by pulling out strips of paper one by one.

Ennoia is like reading each document first, taking structured notes on what matters, and then searching your notes — while keeping the originals on the shelf.

Install

pip install "ennoia[ollama,sentence-transformers,cli]"

Available extras: ollama, openai, anthropic, sentence-transformers, filesystem (Parquet + NumPy stores), cli (ennoia CLI), qdrant (Qdrant vector + hybrid stores), pgvector (PostgreSQL + pgvector hybrid store), server (FastAPI REST + FastMCP), docs (mkdocs-material site), benchmark (CUAD comparison harness — see benchmark/README.md), all (everything above).

Quick start (SDK)

from datetime import date
from typing import Literal

from ennoia import BaseSemantic, BaseStructure, Pipeline, Store
from ennoia.adapters.embedding.sentence_transformers import SentenceTransformerEmbedding
from ennoia.adapters.llm.ollama import OllamaAdapter
from ennoia.store import InMemoryStructuredStore, InMemoryVectorStore


# DDI Schema #1 — structured extraction. Field types drive filter
# operators automatically (Literal → eq/in, date → range ops); the
# docstring is the LLM prompt.
class DocMeta(BaseStructure):
    """Extract basic document metadata."""

    category: Literal["legal", "medical", "financial"]
    doc_date: date


# DDI Schema #2 — semantic extraction. The docstring is the question the
# LLM answers; the answer is embedded for vector search.
class Summary(BaseSemantic):
    """What is the main topic of this document?"""


# DDI Schema #3 — collection extraction. The LLM iterates until it has
# captured every entity; each entity is embedded as its own searchable row.
from ennoia import BaseCollection


class Party(BaseCollection):
    """Extract every party mentioned in the document."""

    company_name: str
    participation_year: int

    def template(self) -> str:
        return f"{self.company_name} ({self.participation_year})"


# Configure the pipeline: schemas + a two-phase store (structured filter
# → vector search) + LLM and embedding adapters.
pipeline = Pipeline(
    schemas=[DocMeta, Summary, Party],
    store=Store(vector=InMemoryVectorStore(), structured=InMemoryStructuredStore()),
    llm=OllamaAdapter(model="qwen3:0.6b"),
    embedding=SentenceTransformerEmbedding(model="all-MiniLM-L6-v2"),
)

# Pre-indexing: every schema runs against the document once, before writing
# structured fields to the structured store and embedded answers to the
# vector store — before any query touches them.
pipeline.index(text="The court held that...", source_id="doc_001")

# Hybrid search: `filters` narrows candidates via the structured store,
# then vector similarity ranks within that subset.
results = pipeline.search(
    query="court holdings on liability",
    filters={"category": "legal"},
    top_k=5,
)

See docs/quickstart.md for the full walkthrough.

Quick start (CLI)

# Iterate on a schema against a single document
ennoia try ./sample.txt --schema my_schemas.py

# Index a folder into a filesystem-backed store
ennoia index ./docs \
  --schema my_schemas.py \
  --store ./my_index \
  --collection cases \
  --llm ollama:qwen3:0.6b \
  --embedding sentence-transformers:all-MiniLM-L6-v2

# …or into a production Qdrant / pgvector backend
ennoia index ./docs \
  --schema my_schemas.py \
  --store qdrant:cases \
  --qdrant-url http://localhost:6333 \
  --llm openai:gpt-4o-mini \
  --embedding openai-embedding:text-embedding-3-small

# Hybrid search
ennoia search "employer duty to accommodate disability" \
  --schema my_schemas.py \
  --store ./my_index \
  --collection cases \
  --filter "jurisdiction=WA" \
  --filter "date_decided__gte=2020-01-01" \
  --top-k 5

See docs/cli.md.

Serve an index (REST + MCP)

Stage 3 ships two remote interfaces. Both accept the same --store prefix scheme (filesystem path, qdrant:<collection>, or pgvector:<collection>) as ennoia index:

# REST — full CRUD for application integration.
export ENNOIA_API_KEY=sekret
ennoia api --store ./my_index --schema my_schemas.py --port 8080

# MCP — read-only tools (discover_schema, filter, search, retrieve) for agents,
# pointed at a production Qdrant collection.
export ENNOIA_QDRANT_URL=http://localhost:6333
ennoia mcp --store qdrant:cases --schema my_schemas.py --transport sse --port 8090

Agents consume the MCP flow discover_schema → filter → search(filter_ids=...) → retrieve out of the box. See docs/serve.md.

Benchmarks

A reproducible CUAD legal-QA benchmark pits ennoia DDI+RAG against a textbook langchain shred-embed RAG baseline using identical models (gpt-5.4-nano for generation, text-embedding-3-small for embeddings, gpt-5.4 as judge):

CUAD benchmark

See benchmark/README.md for methodology, the one-command reproduction, and the cookbook walkthrough at docs/cookbook/cuad-benchmark.md.

Documentation

License

Apache 2.0. See LICENSE.txt and NOTICE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vunone

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.1

Apr 20, 2026

0.4.0

Apr 19, 2026

This version

0.3.1

Apr 17, 2026

0.3.0

Apr 16, 2026

0.2.1

Apr 16, 2026

0.2.0

Apr 16, 2026

0.1.1

Apr 15, 2026

0.1.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ennoia-0.3.1.tar.gz (532.7 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ennoia-0.3.1-py3-none-any.whl (100.8 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file ennoia-0.3.1.tar.gz.

File metadata

Download URL: ennoia-0.3.1.tar.gz
Upload date: Apr 17, 2026
Size: 532.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ennoia-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`c2cab303a06f3517b2cb27bbcbea8a3f26788275affd4d2270219d4c778224f9`
MD5	`868118b4a69aed3fa43705f4e5f73cc3`
BLAKE2b-256	`e05120b386c2ea09800f038f1177588a0b8c5f16a0ce46f302d81378ad551fcb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ennoia-0.3.1.tar.gz:

Publisher: release.yml on vunone/ennoia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ennoia-0.3.1.tar.gz
- Subject digest: c2cab303a06f3517b2cb27bbcbea8a3f26788275affd4d2270219d4c778224f9
- Sigstore transparency entry: 1331147691
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: vunone/ennoia@84c07d2523afa77df7d7ce71a784cef42cfa5eed
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/vunone
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@84c07d2523afa77df7d7ce71a784cef42cfa5eed
- Trigger Event: push

File details

Details for the file ennoia-0.3.1-py3-none-any.whl.

File metadata

Download URL: ennoia-0.3.1-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 100.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ennoia-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fa7429dfcf3b7f36283d1ce512e46e46410b6c2ef4b2fc57230021dee01ff70b`
MD5	`15672c402440c39623347749ba7b552b`
BLAKE2b-256	`b239a83d0b464d380173f01e17b4c56effc7e55c07731960f2fbf627ae0a8133`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ennoia-0.3.1-py3-none-any.whl:

Publisher: release.yml on vunone/ennoia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ennoia-0.3.1-py3-none-any.whl
- Subject digest: fa7429dfcf3b7f36283d1ce512e46e46410b6c2ef4b2fc57230021dee01ff70b
- Sigstore transparency entry: 1331147733
- Sigstore integration time: Apr 17, 2026
Source repository:
- Permalink: vunone/ennoia@84c07d2523afa77df7d7ce71a784cef42cfa5eed
- Branch / Tag: refs/tags/v0.3.1
- Owner: https://github.com/vunone
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@84c07d2523afa77df7d7ce71a784cef42cfa5eed
- Trigger Event: push

ennoia 0.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Ennoia

Install

Quick start (SDK)

Quick start (CLI)

Serve an index (REST + MCP)

Benchmarks

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance