Turn messy files into agent-ready context.
Project description
ContextIQ
ContextIQ turns messy files into agent-ready context.
It is a local-first ingestion pipeline for developers building RAG systems, agent memory layers, document search, and eval datasets. Point it at a folder and it produces clean JSONL and Markdown exports with chunked, traceable content.
Why it exists
Most AI tooling starts after your data is already clean. Real projects get stuck much earlier:
- PDFs are noisy
- Word docs lose structure
- repos and notes mix formats
- chunks are inconsistent
- source traceability is easy to lose
ContextIQ focuses on the missing middle: consistent ingestion, chunking, and export.
Features
- Local-first CLI
- Recursive file ingestion
- Built-in support for:
.txt,.md,.rst.json,.jsonl.csv,.tsv.html,.htm- optional
.pdfviapypdf - optional
.docxviapython-docx
- Document-aware chunking
- Source-preserving metadata
- JSONL and Markdown exports
- Run manifest with counts, warnings, and timings
Quickstart
python -m venv .venv
. .venv/bin/activate
pip install -e .[dev]
contextiq ingest ./examples --out ./build/context
On Windows PowerShell:
python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -e .[dev]
contextiq ingest .\examples --out .\build\context
CLI
contextiq ingest <path> --out <directory>
Useful flags:
--include-ext .md,.txt,.json--exclude-glob "*.min.js,*.lock"--chunk-size 1200--chunk-overlap 150--formats jsonl,markdown--fail-on-warning
Output
contextiq ingest writes:
documents.jsonl: normalized source documentschunks.jsonl: chunked outputs for RAG/agentschunks.md: human-readable review filemanifest.json: summary of the run
Each chunk preserves:
- source path
- document id
- chunk id
- byte and character ranges when available
- headings / section hints
Example
contextiq ingest ./docs --out ./dist/context --chunk-size 900 --chunk-overlap 120
Development
pip install -e .[dev]
pytest
Roadmap
- embeddings plugin interface
- vector DB exporters
- OCR pipeline
- table extraction
- citation-aware retrieval benchmarks
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contextiq-0.1.0.tar.gz.
File metadata
- Download URL: contextiq-0.1.0.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
019e880ffa0319712f33df5c15330cd85ee0f336ea1d9cdaf650fa85fb1499f6
|
|
| MD5 |
687e521cce56c7a837bf6d22d5a8b505
|
|
| BLAKE2b-256 |
c33e859d4b5a6047254a397c9765516ef0b113a0669bcafd3f050c890c9ce379
|
File details
Details for the file contextiq-0.1.0-py3-none-any.whl.
File metadata
- Download URL: contextiq-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e72b6fc5b9a7e0051dd1547ce5a9304a9a5f67e531f3b136144a2144bcee9b6f
|
|
| MD5 |
afde368805150c02e6bab16b3a60fd6e
|
|
| BLAKE2b-256 |
28b1bacd62691799de9a2d518f4975693cdd06b5200b50dbb294346ac0489138
|