Skip to main content

Turn messy files into agent-ready context.

Project description

ContextIQ

ContextIQ turns messy files into agent-ready context.

It is a local-first ingestion pipeline for developers building RAG systems, agent memory layers, document search, and eval datasets. Point it at a folder and it produces clean JSONL and Markdown exports with chunked, traceable content.

Why it exists

Most AI tooling starts after your data is already clean. Real projects get stuck much earlier:

  • PDFs are noisy
  • Word docs lose structure
  • repos and notes mix formats
  • chunks are inconsistent
  • source traceability is easy to lose

ContextIQ focuses on the missing middle: consistent ingestion, chunking, and export.

Features

  • Local-first CLI
  • Recursive file ingestion
  • Built-in support for:
    • .txt, .md, .rst
    • .json, .jsonl
    • .csv, .tsv
    • .html, .htm
    • optional .pdf via pypdf
    • optional .docx via python-docx
  • Document-aware chunking
  • Source-preserving metadata
  • JSONL and Markdown exports
  • Run manifest with counts, warnings, and timings

Quickstart

python -m venv .venv
. .venv/bin/activate
pip install -e .[dev]
contextiq ingest ./examples --out ./build/context

On Windows PowerShell:

python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -e .[dev]
contextiq ingest .\examples --out .\build\context

CLI

contextiq ingest <path> --out <directory>

Useful flags:

  • --include-ext .md,.txt,.json
  • --exclude-glob "*.min.js,*.lock"
  • --chunk-size 1200
  • --chunk-overlap 150
  • --formats jsonl,markdown
  • --fail-on-warning

Output

contextiq ingest writes:

  • documents.jsonl: normalized source documents
  • chunks.jsonl: chunked outputs for RAG/agents
  • chunks.md: human-readable review file
  • manifest.json: summary of the run

Each chunk preserves:

  • source path
  • document id
  • chunk id
  • byte and character ranges when available
  • headings / section hints

Example

contextiq ingest ./docs --out ./dist/context --chunk-size 900 --chunk-overlap 120

Development

pip install -e .[dev]
pytest

Roadmap

  • embeddings plugin interface
  • vector DB exporters
  • OCR pipeline
  • table extraction
  • citation-aware retrieval benchmarks

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextiq-0.1.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contextiq-0.1.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file contextiq-0.1.0.tar.gz.

File metadata

  • Download URL: contextiq-0.1.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contextiq-0.1.0.tar.gz
Algorithm Hash digest
SHA256 019e880ffa0319712f33df5c15330cd85ee0f336ea1d9cdaf650fa85fb1499f6
MD5 687e521cce56c7a837bf6d22d5a8b505
BLAKE2b-256 c33e859d4b5a6047254a397c9765516ef0b113a0669bcafd3f050c890c9ce379

See more details on using hashes here.

File details

Details for the file contextiq-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: contextiq-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for contextiq-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e72b6fc5b9a7e0051dd1547ce5a9304a9a5f67e531f3b136144a2144bcee9b6f
MD5 afde368805150c02e6bab16b3a60fd6e
BLAKE2b-256 28b1bacd62691799de9a2d518f4975693cdd06b5200b50dbb294346ac0489138

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page