Make directories AI-ready, not just files — turn a directory into a portable knowledge space.
Project description
indx
Make directories AI-ready, not just files. Point indx at a folder and get back a knowledge space: structure, folder lineage, file-to-file relationships, and semantic metadata that AI agents and RAG systems can reason over. Open-source · Python · CLI + SDK · Apache-2.0.
See it: indx demo (build → inspect → query, fully offline)
One command builds, inspects, and queries a bundled sample corpus — no user data, no installs, no API keys. Real captured output:
$ indx demo
indx demo — building a sample 'team handbook' knowledge space…
stage: walk
stage: parse
stage: chunk
stage: relate
stage: enrich
stage: embed-pack
✓ 7 docs · 7 chunks · 19 relations → /tmp/indx-demo-XXXX/demo (0.01s)
components: parser=plaintext llm=none embedder=hash store=jsonl format=.indx
/tmp/indx-demo-XXXX/demo schema=1 indx=0.0.1
documents=7 chunks=7 relations=19 embeddings=7 embedding=hash/256
Types Relations
type count type count
markdown 6 references 14
text 1 sibling 5
sample query (keyword/lexical, offline): how do I onboard?
score source text
0.121 engineering/code-review.md # Code Review Code review keeps our codebase…
0.098 people/remote-work.md # Remote Work Policy Acme Robotics is remote-…
0.095 handbook/welcome.md # Welcome to Acme Robotics This is the Acme …
✓ that's the whole flow — built offline with keyword/lexical retrieval, no API key.
run it on your own folder: indx ./your-docs --out ./ai-ready.indx --offline
The recording above is a trimmed, ANSI-stripped transcript of an actual
indx demorun.
pip install indx
indx demo # instant: build → inspect → query a bundled sample, fully offline, no data needed
indx ./docs --out ./ai-ready.indx --offline # index your own folder, fully offline (zero extra deps)
indx inspect ./ai-ready.indx
indx query ./ai-ready.indx "how do I onboard?"
indx app # visual, config-driven tester: build → inspect → query in the browser (pip install indx[app])
The default stack targets cloud backends (docling parser, OpenAI LLM + embeddings, qdrant store) — install it with
pip install indx[cloud]and set the matching API keys.--offlineselects the zero-dependency core stack (plaintext parser →hashembedder →jsonlno-DB store →.indxarchive), so every command above runs as-is on a barepip install indxwith no extras and nothing to configure. For a fully managed single-vendor build, three cloud profile extras wire every slot to that cloud's services with one install and one flag:pip install "indx[aws]"→indx ./docs --out ./out --aws(Textract → Bedrock → Titan → S3 Vectors),pip install "indx[azure]"→indx ./docs --out ./out --azure(Document Intelligence → Azure OpenAI → AI Search),pip install "indx[gcp]"→indx ./docs --out ./out --gcp(Document AI → Gemini → gemini-embedding → BigQuery).Note what the offline core does and doesn't do. The
hashembedder is a deterministic hashing trick, so offlinequeryis keyword/lexical retrieval, not semantic vector search — true semantic search needs a real embedder extra (e.g.bgeoropenai) selected explicitly. Likewise, the offlineenrichstep derives metadata (type, topics, tags, summary) locally and without an LLM call; LLM/VLM enrichment is opt-in via the cloud/local extras.
indx composes file parsers (Docling, Unstructured, …) rather than replacing them, then layers on what they discard — the arrangement of files. Every major component (parser, LLM, embedder, vector store, output) is a swappable, typed slot, so you can run the cloud default stack or the fully offline core from the same CLI.
Plug a knowledge space into an AI agent
A .indx archive is a portable knowledge space — carry it like a USB drive and plug it
into any agent framework in one line:
from indx.agent import connect
kb = connect("ai-ready/handbook.indx") # load the "USB drive"
tools = kb.openai() # OpenAI Agents SDK …or .langchain() / .pydantic_ai() / .claude()
Or serve it to any MCP client — Claude Desktop, Cursor, the TypeScript Mastra framework — with no Python glue on the client side:
pip install "indx[agent]" # all framework adapters + the MCP server
indx mcp ai-ready/handbook.indx # serve indx_search / indx_overview / indx_get_document
Every connector exposes the same three read-only tools — search, overview, get-document — built on the same retrieval path as the CLI. See the AI agents guide.
Status
Alpha (0.0.1). The zero-dependency core path (plaintext parser → hash embedder →
jsonl no-DB store → .indx archive) runs end to end and is fully air-gapped — reach it
with indx demo or by adding --offline to any build. The optional cloud/local backends
(docling, openai, ollama, bge-m3, qdrant, plus the managed AWS/Azure/GCP profiles, …) are
implemented and selected through the registry: install the matching extra
(e.g. pip install "indx[cloud]") and provide credentials to switch a slot onto it. The
.indx archive format is at schema_version "1"; public APIs may still shift before
1.0 — see the CHANGELOG and the
documentation.
Documentation
Full documentation — quickstart, guides, the pipeline & stages, and the API/CLI reference — lives at docs.indx.jp.
Development
python -m venv .venv && . .venv/bin/activate
pip install -e ".[dev]"
nox -s tests # fast offline suite: unit + corpus
nox -l # list every session (integration / docker / airgap / live / record-fixtures)
License
Apache-2.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file indx-0.0.1.tar.gz.
File metadata
- Download URL: indx-0.0.1.tar.gz
- Upload date:
- Size: 283.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf6be3851e3f14c677e4df483b452930b63b4d432befac298cf8e4e8257abfc1
|
|
| MD5 |
78bf347af49c691cd3d70df6ef2a0193
|
|
| BLAKE2b-256 |
e4468e09ddbb913c99e60e5741b6aa371846a6ca8839bfaece6578c71d4599d7
|
Provenance
The following attestation bundles were made for indx-0.0.1.tar.gz:
Publisher:
release.yml on indxjp/indx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
indx-0.0.1.tar.gz -
Subject digest:
bf6be3851e3f14c677e4df483b452930b63b4d432befac298cf8e4e8257abfc1 - Sigstore transparency entry: 1740258198
- Sigstore integration time:
-
Permalink:
indxjp/indx@a5aafa998cbf86a5c4ccdd7d75a2bc16b58cd150 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/indxjp
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a5aafa998cbf86a5c4ccdd7d75a2bc16b58cd150 -
Trigger Event:
push
-
Statement type:
File details
Details for the file indx-0.0.1-py3-none-any.whl.
File metadata
- Download URL: indx-0.0.1-py3-none-any.whl
- Upload date:
- Size: 224.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cc93cbdc6b3147ecd5e363dbf4c543ba5c2609747e52b6a61de1837f091947f
|
|
| MD5 |
518bdfc61baec6e22377334de8c695da
|
|
| BLAKE2b-256 |
90bdf25d44261f6a6123d57283345a81fbb139342b7a58138afc40098222f6bb
|
Provenance
The following attestation bundles were made for indx-0.0.1-py3-none-any.whl:
Publisher:
release.yml on indxjp/indx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
indx-0.0.1-py3-none-any.whl -
Subject digest:
9cc93cbdc6b3147ecd5e363dbf4c543ba5c2609747e52b6a61de1837f091947f - Sigstore transparency entry: 1740258227
- Sigstore integration time:
-
Permalink:
indxjp/indx@a5aafa998cbf86a5c4ccdd7d75a2bc16b58cd150 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/indxjp
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a5aafa998cbf86a5c4ccdd7d75a2bc16b58cd150 -
Trigger Event:
push
-
Statement type: