Skip to main content

Production-grade Python framework for building agentic RAG applications. Multilingual-capable with a roadmap toward Spanish/LATAM-first features.

Project description

cenote

CI codecov Docs Python License Ruff

Production-grade RAG primitives for Python — Protocol-based, multi-tenant by design, type-strict from day one. Targeting Spanish/LATAM workloads from M1.1.

Why cenote

cenote is not a LangChain alternative. LangChain is a kitchen-sink framework with ~100k stars and a full-time team. cenote is the opposite: a small, opinionated set of primitives for teams that hit framework complexity ceilings.

  • Production minimalist — clear Protocol interfaces, composition over inheritance, engineering hardenings (batching, rate limiting, transactional upserts) built in.
  • Type-strictmypy --strict clean. py.typed shipped. Your IDE catches wiring errors before runtime.
  • Multi-tenant by designnamespace is mandatory on every store and retriever method. Cross-tenant leakage is impossible by construction.
  • LATAM-first roadmap — Spanish-aware BM25, ES evaluation datasets, fiscal/regulatory document support land in M1.1+. Multilingual embedders (Voyage, Cohere) already work today.

The name comes from cenotes — natural deep wells in the Yucatán Peninsula used by the Maya as sacred sources of fresh water and knowledge. The metaphor maps to RAG: a deep, structured source of knowledge from which you retrieve context.

Status

Module M1.0 (released) M1.1+ (planned)
cenote.models ✓ Document, Chunk, EmbeddedChunk, RetrievalResult
cenote.errors ✓ CenoteError hierarchy
cenote.types ✓ Vector, Namespace, ModelId, ContentHash
cenote.chunkers ✓ Chunker Protocol, RecursiveCharacterChunker MarkdownChunker, token-aware chunking
cenote.embedders ✓ Embedder Protocol, MockEmbedder, VoyageEmbedder, CohereEmbedder, EmbeddingCache, InMemoryCache, CachedEmbedder Streaming embed, SqliteCache, RedisCache
cenote.stores ✓ VectorStore Protocol, InMemoryVectorStore, PgVectorStore
cenote.retrievers ✓ Retriever Protocol, VectorRetriever BM25Retriever, HybridRetriever (RRF), Spanish-aware tokenizer
cenote.rerankers ✓ Reranker Protocol (no impl) VoyageReranker, CohereReranker
cenote.observability ✓ Tracer Protocol, NoopTracer OTel adapter, Langfuse adapter
cenote.eval ✓ precision_at_k, recall_at_k, mean_reciprocal_rank DeepEval integration, bilingual EN/ES dataset
cenote.llm Anthropic Claude wrapper with prompt-cache awareness

Quickstart

pip install cenote-core
import asyncio
from cenote.chunkers import RecursiveCharacterChunker
from cenote.embedders import MockEmbedder
from cenote.models import Document
from cenote.retrievers import VectorRetriever
from cenote.stores import InMemoryVectorStore


async def main() -> None:
    chunker = RecursiveCharacterChunker(chunk_size=512, chunk_overlap=64)
    embedder = MockEmbedder(dimensions=128)
    store = InMemoryVectorStore(dimensions=128)
    retriever = VectorRetriever(embedder=embedder, store=store)

    doc = Document(id="d1", content="Cenotes are natural sinkholes in the Yucatán Peninsula.")
    chunks = chunker.chunk(doc)
    embedded = await embedder.embed(chunks)
    await store.upsert(embedded, namespace="quickstart")

    results = await retriever.retrieve("What is a cenote?", namespace="quickstart", limit=3)
    for r in results:
        print(f"[{r.score:.3f}] {r.chunk.content}")


asyncio.run(main())

For real semantic retrieval, swap MockEmbedder for VoyageEmbedder(api_key=..., model="voyage-3") or CohereEmbedder(api_key=..., model="embed-multilingual-v3.0"). For production storage, PgVectorStore.connect(dsn, dimensions=...).

→ Full quickstart: https://jovandyaz.github.io/cenote/quickstart/

Extending cenote

Every primitive is a typing.Protocol — implement the interface and plug it in. No inheritance required.

from cenote.models import Chunk, EmbeddedChunk
from cenote.types import Vector


class MyEmbedder:
    """Satisfies the Embedder protocol via structural typing."""

    @property
    def model_id(self) -> str:
        return "my-provider:my-model"

    @property
    def dimensions(self) -> int:
        return 768

    async def embed(self, chunks: list[Chunk]) -> list[EmbeddedChunk]:
        ...

    async def embed_query(self, query: str) -> Vector:
        ...

→ Full example: examples/custom_embedder.py → Custom chunker: https://jovandyaz.github.io/cenote/extending/custom-chunker/

Architecture

Three diagrams document the system at different zoom levels:

GitHub renders .drawio files inline natively (since 2024). Click any link above to view.

→ Full architecture page: https://jovandyaz.github.io/cenote/architecture/

Roadmap

  • M1.0 (released as v0.1.0) — Core primitives: chunker, embedders, stores, retrievers, future-API stubs
  • 🚧 M1.1 — MarkdownChunker, BM25 + Hybrid retrievers, Spanish-aware tokenizer, concrete rerankers, DeepEval integration
  • 📋 M1.2+ — OTel/Langfuse adapters, LLM client (Anthropic Claude with prompt caching), agent primitives, CFDI domain pack

See CHANGELOG.md for a granular record of what shipped when.

Downstream products

cenote is the shared core for two products in development:

  • knowtis-ai — RAG + research agent over the Knowtis notes platform
  • cfdi-agent — Accounting reconciliation + CFDI 4.0 compliance for Mexican PYMEs

Each downstream product validates cenote from opposite ends: knowtis-ai favors creative synthesis, cfdi-agent demands deterministic correctness with audit trails.

License

Apache 2.0.

Author

Jovan Díaz — github.com/jovandyaz

Contributions: see CONTRIBUTING.md. Security: see SECURITY.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cenote_core-0.1.0.tar.gz (137.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cenote_core-0.1.0-py3-none-any.whl (32.2 kB view details)

Uploaded Python 3

File details

Details for the file cenote_core-0.1.0.tar.gz.

File metadata

  • Download URL: cenote_core-0.1.0.tar.gz
  • Upload date:
  • Size: 137.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cenote_core-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4f230cb5ebdaf2bce1f2bc46029593b457750c54b72dc25dec3644f217f03e1e
MD5 a9b87f4405871424b922a2218db3176e
BLAKE2b-256 8992fe39949b33864ba95ae83dd8b088a2bc47504673ae6d7a87e383b8fe0eb8

See more details on using hashes here.

Provenance

The following attestation bundles were made for cenote_core-0.1.0.tar.gz:

Publisher: release.yml on jovandyaz/cenote

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cenote_core-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cenote_core-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for cenote_core-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5549c60dfdea656af2ba79bbbd45d948cfa40cd8556ac185f83ebdc473a98ee6
MD5 f87abebcf3dcb446b74f9f61dea16f4e
BLAKE2b-256 858a3d61724a57964f63b07ab336a76800ce486d14e201b9257c2ffeb1c90a2a

See more details on using hashes here.

Provenance

The following attestation bundles were made for cenote_core-0.1.0-py3-none-any.whl:

Publisher: release.yml on jovandyaz/cenote

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page