Turn messy files into agent-ready context.

These details have not been verified by PyPI

Project description

Turn messy files into agent-ready context for RAG, search, and AI workflows.

ContextIQ

ContextIQ is a local-first ingestion pipeline for developers building RAG systems, agent memory layers, document search, and eval datasets.

Point it at a folder of mixed files and it produces clean, traceable JSONL and Markdown outputs that AI systems can actually use.

Why ContextIQ

Most AI tooling starts after your data is already clean. Real projects usually break much earlier:

PDFs are noisy
Word docs lose structure
JSON and CSV need normalization
repos and notes mix formats
chunks become inconsistent
source traceability gets lost

ContextIQ focuses on the missing middle: ingestion, normalization, chunking, and export.

Installation

Install from PyPI:

pip install contextiq

Run the CLI:

contextiq ingest ./docs --out ./build/context

Or with module execution:

python -m contextiq ingest ./docs --out ./build/context

Quickstart

Use the built-in example content:

contextiq ingest ./examples --out ./build/context

PowerShell example:

contextiq ingest .\examples --out .\build\context

Generated output:

documents.jsonl - normalized source documents
chunks.jsonl - chunked outputs for RAG and agents
chunks.md - human-readable review output
manifest.json - run summary, warnings, and config

What It Supports

Built-in file types

.txt, .md, .rst
.json, .jsonl
.csv, .tsv
.html, .htm
optional .pdf via pypdf
optional .docx via python-docx

Output behavior

recursive directory ingestion
normalized plain-text extraction
document-aware chunking
source-preserving metadata
JSONL and Markdown export
manifest output for reproducibility

CLI

Basic usage

contextiq ingest <path> --out <directory>

Useful flags

--include-ext .md,.txt,.json
--exclude-glob "*.min.js,*.lock"
--chunk-size 1200
--chunk-overlap 150
--formats jsonl,markdown
--fail-on-warning

Example commands

contextiq ingest ./docs --out ./dist/context --chunk-size 900 --chunk-overlap 120

contextiq ingest ./knowledge-base --out ./build/export --include-ext .md,.txt,.json

How It Works

ContextIQ runs in four stages:

1. Discovery

Recursively finds supported files while skipping common noise such as virtualenvs, caches, and build directories.

2. Loading and normalization

Converts each file into normalized plain text:

Markdown and text are read directly
JSON and JSONL are pretty-printed into readable text
CSV and TSV become row-based text
HTML is stripped to visible text
PDF and DOCX are supported through optional extras

3. Chunking

Splits documents into retrieval-friendly chunks with:

target chunk size
overlap between chunks
paragraph and sentence-aware boundaries
source path and character ranges preserved

4. Export

Writes machine-friendly and human-readable outputs for downstream AI workflows.

Project Structure

src/contextiq/
|- cli.py
|- pipeline.py
|- loaders.py
|- chunking.py
|- exporters.py
|- discovery.py
|- models.py
`- utils.py

Use Cases

RAG ingestion

Prepare mixed files for vector indexing and retrieval pipelines.

Agent memory and context packing

Turn project docs into clean, bounded chunks for coding and research agents.

Search systems

Produce normalized text and chunk exports for semantic or hybrid retrieval.

Eval datasets

Create stable, traceable corpora for retrieval benchmarking and prompt evaluation.

Development

Install editable dependencies:

pip install -e .[dev]

Run tests:

pytest

Run the demo:

.\demo.ps1

Roadmap

embeddings plugin interface
vector database exporters
OCR pipeline
table extraction
citation-aware retrieval benchmarks

Contributing

Contributions are welcome.

improve loaders
add exporters
extend chunking strategies
improve docs and examples

Open an issue or submit a PR if you want to help shape ContextIQ.

License

MIT License - see LICENSE

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Jun 16, 2026

0.1.0

Jun 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextiq-0.1.1.tar.gz (12.3 kB view details)

Uploaded Jun 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

contextiq-0.1.1-py3-none-any.whl (12.0 kB view details)

Uploaded Jun 16, 2026 Python 3

File details

Details for the file contextiq-0.1.1.tar.gz.

File metadata

Download URL: contextiq-0.1.1.tar.gz
Upload date: Jun 16, 2026
Size: 12.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contextiq-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`d01ae23d2750cf422f5c82052d53aebd4c6d41a01ac06a283aeaa99bedcf54a2`
MD5	`f439391e13b5f9421e58e95f39ff182f`
BLAKE2b-256	`64df7153ed63e5206336890ac64c1ceabe241bb39724404a2bf2fda6ef0f16e2`

See more details on using hashes here.

File details

Details for the file contextiq-0.1.1-py3-none-any.whl.

File metadata

Download URL: contextiq-0.1.1-py3-none-any.whl
Upload date: Jun 16, 2026
Size: 12.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contextiq-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a3488b853db5186f5297a3b99d02616f398c794da79c807ace82b94c0207d10d`
MD5	`59f0fccc9c8bab5c19777a08b98d6582`
BLAKE2b-256	`89d774a717a0cac49bcfc99ca390abc397ae429b64c14ae251a3395f453152a3`

See more details on using hashes here.

contextiq 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

ContextIQ

Why ContextIQ

Installation

Quickstart

What It Supports

Built-in file types

Output behavior

CLI

Basic usage

Useful flags

Example commands

How It Works

1. Discovery

2. Loading and normalization

3. Chunking

4. Export

Project Structure

Use Cases

RAG ingestion

Agent memory and context packing

Search systems

Eval datasets

Development

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes