Production-grade Python framework for building agentic RAG applications. Multilingual-capable with a roadmap toward Spanish/LATAM-first features.
Project description
cenote
Production-grade RAG primitives for Python — Protocol-based, multi-tenant by design, type-strict from day one. Targeting Spanish/LATAM workloads from M1.1.
Why cenote
cenote is not a LangChain alternative. LangChain is a kitchen-sink framework with ~100k stars and a full-time team. cenote is the opposite: a small, opinionated set of primitives for teams that hit framework complexity ceilings.
- Production minimalist — clear
Protocolinterfaces, composition over inheritance, engineering hardenings (batching, rate limiting, transactional upserts) built in. - Type-strict —
mypy --strictclean.py.typedshipped. Your IDE catches wiring errors before runtime. - Multi-tenant by design —
namespaceis mandatory on every store and retriever method. Cross-tenant leakage is impossible by construction. - LATAM-first roadmap — Spanish-aware BM25, ES evaluation datasets, fiscal/regulatory document support land in M1.1+. Multilingual embedders (Voyage, Cohere) already work today.
The name comes from cenotes — natural deep wells in the Yucatán Peninsula used by the Maya as sacred sources of fresh water and knowledge. The metaphor maps to RAG: a deep, structured source of knowledge from which you retrieve context.
Status
| Module | M1.0 (released) | M1.1+ (planned) |
|---|---|---|
cenote.models |
✓ Document, Chunk, EmbeddedChunk, RetrievalResult | — |
cenote.errors |
✓ CenoteError hierarchy | — |
cenote.types |
✓ Vector, Namespace, ModelId, ContentHash | — |
cenote.chunkers |
✓ Chunker Protocol, RecursiveCharacterChunker | MarkdownChunker, token-aware chunking |
cenote.embedders |
✓ Embedder Protocol, MockEmbedder, VoyageEmbedder, CohereEmbedder, EmbeddingCache, InMemoryCache, CachedEmbedder | Streaming embed, SqliteCache, RedisCache |
cenote.stores |
✓ VectorStore Protocol, InMemoryVectorStore, PgVectorStore | — |
cenote.retrievers |
✓ Retriever Protocol, VectorRetriever | BM25Retriever, HybridRetriever (RRF), Spanish-aware tokenizer |
cenote.rerankers |
✓ Reranker Protocol (no impl) | VoyageReranker, CohereReranker |
cenote.observability |
✓ Tracer Protocol, NoopTracer | OTel adapter, Langfuse adapter |
cenote.eval |
✓ precision_at_k, recall_at_k, mean_reciprocal_rank | DeepEval integration, bilingual EN/ES dataset |
cenote.llm |
— | Anthropic Claude wrapper with prompt-cache awareness |
Quickstart
pip install cenote-core
import asyncio
from cenote.chunkers import RecursiveCharacterChunker
from cenote.embedders import MockEmbedder
from cenote.models import Document
from cenote.retrievers import VectorRetriever
from cenote.stores import InMemoryVectorStore
async def main() -> None:
chunker = RecursiveCharacterChunker(chunk_size=512, chunk_overlap=64)
embedder = MockEmbedder(dimensions=128)
store = InMemoryVectorStore(dimensions=128)
retriever = VectorRetriever(embedder=embedder, store=store)
doc = Document(id="d1", content="Cenotes are natural sinkholes in the Yucatán Peninsula.")
chunks = chunker.chunk(doc)
embedded = await embedder.embed(chunks)
await store.upsert(embedded, namespace="quickstart")
results = await retriever.retrieve("What is a cenote?", namespace="quickstart", limit=3)
for r in results:
print(f"[{r.score:.3f}] {r.chunk.content}")
asyncio.run(main())
For real semantic retrieval, swap MockEmbedder for VoyageEmbedder(api_key=..., model="voyage-3") or CohereEmbedder(api_key=..., model="embed-multilingual-v3.0"). For production storage, PgVectorStore.connect(dsn, dimensions=...).
→ Full quickstart: https://jovandyaz.github.io/cenote/quickstart/
Extending cenote
Every primitive is a typing.Protocol — implement the interface and plug it in. No inheritance required.
from cenote.models import Chunk, EmbeddedChunk
from cenote.types import Vector
class MyEmbedder:
"""Satisfies the Embedder protocol via structural typing."""
@property
def model_id(self) -> str:
return "my-provider:my-model"
@property
def dimensions(self) -> int:
return 768
async def embed(self, chunks: list[Chunk]) -> list[EmbeddedChunk]:
...
async def embed_query(self, query: str) -> Vector:
...
→ Full example: examples/custom_embedder.py → Custom chunker: https://jovandyaz.github.io/cenote/extending/custom-chunker/
Architecture
Three diagrams document the system at different zoom levels:
- Ecosystem — cenote's position in the wider RAG ecosystem
- Internal architecture — 5 layers + future-API stubs
- Runtime flow — indexing path and query path sequence
GitHub renders .drawio files inline natively (since 2024). Click any link above to view.
→ Full architecture page: https://jovandyaz.github.io/cenote/architecture/
Roadmap
- ✅ M1.0 (released as v0.1.0) — Core primitives: chunker, embedders, stores, retrievers, future-API stubs
- 🚧 M1.1 — MarkdownChunker, BM25 + Hybrid retrievers, Spanish-aware tokenizer, concrete rerankers, DeepEval integration
- 📋 M1.2+ — OTel/Langfuse adapters, LLM client (Anthropic Claude with prompt caching), agent primitives, CFDI domain pack
See CHANGELOG.md for a granular record of what shipped when.
Downstream products
cenote is the shared core for two products in development:
- knowtis-ai — RAG + research agent over the Knowtis notes platform
- cfdi-agent — Accounting reconciliation + CFDI 4.0 compliance for Mexican PYMEs
Each downstream product validates cenote from opposite ends: knowtis-ai favors creative synthesis, cfdi-agent demands deterministic correctness with audit trails.
License
Author
Jovan Díaz — github.com/jovandyaz
Contributions: see CONTRIBUTING.md. Security: see SECURITY.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cenote_core-0.1.0.tar.gz.
File metadata
- Download URL: cenote_core-0.1.0.tar.gz
- Upload date:
- Size: 137.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f230cb5ebdaf2bce1f2bc46029593b457750c54b72dc25dec3644f217f03e1e
|
|
| MD5 |
a9b87f4405871424b922a2218db3176e
|
|
| BLAKE2b-256 |
8992fe39949b33864ba95ae83dd8b088a2bc47504673ae6d7a87e383b8fe0eb8
|
Provenance
The following attestation bundles were made for cenote_core-0.1.0.tar.gz:
Publisher:
release.yml on jovandyaz/cenote
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cenote_core-0.1.0.tar.gz -
Subject digest:
4f230cb5ebdaf2bce1f2bc46029593b457750c54b72dc25dec3644f217f03e1e - Sigstore transparency entry: 1635923710
- Sigstore integration time:
-
Permalink:
jovandyaz/cenote@35a91cf6cbb8807b8a2b07eafbb811a42b3fdd77 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jovandyaz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@35a91cf6cbb8807b8a2b07eafbb811a42b3fdd77 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cenote_core-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cenote_core-0.1.0-py3-none-any.whl
- Upload date:
- Size: 32.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5549c60dfdea656af2ba79bbbd45d948cfa40cd8556ac185f83ebdc473a98ee6
|
|
| MD5 |
f87abebcf3dcb446b74f9f61dea16f4e
|
|
| BLAKE2b-256 |
858a3d61724a57964f63b07ab336a76800ce486d14e201b9257c2ffeb1c90a2a
|
Provenance
The following attestation bundles were made for cenote_core-0.1.0-py3-none-any.whl:
Publisher:
release.yml on jovandyaz/cenote
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cenote_core-0.1.0-py3-none-any.whl -
Subject digest:
5549c60dfdea656af2ba79bbbd45d948cfa40cd8556ac185f83ebdc473a98ee6 - Sigstore transparency entry: 1635923849
- Sigstore integration time:
-
Permalink:
jovandyaz/cenote@35a91cf6cbb8807b8a2b07eafbb811a42b3fdd77 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/jovandyaz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@35a91cf6cbb8807b8a2b07eafbb811a42b3fdd77 -
Trigger Event:
push
-
Statement type: