Skip to main content

Tools to make contexts for AI

Project description

contaix

Tools to make contexts (knowledge bases) for AI.

contaix turns diverse sources — code repos, documentation sites, file collections — into clean markdown contexts for AI agents to consume.

pip install contaix

Claude Code Skills

contaix ships with Claude Code skills so an AI agent can use it directly. After installing, run:

python -m contaix.skills

This symlinks each skill into ~/.claude/skills/, where Claude Code discovers them automatically.

Skill When the agent uses it
contaix_docs_to_markdown "Get me the docs", "download documentation", "make a markdown of this docs site"
contaix_web_aggregate "Aggregate these pages into markdown" — falls back to manual curation when sites need it
dn-repair (from dn) "Clean up this markdown", "fix broken links"
pdfdol-read (from pdfdol) "Extract text from these PDFs"
pdfdol-convert (from pdfdol) "Convert this to PDF"

python -m contaix.skills also picks up skills shipped by the related packages (dn, pdfdol) when those are installed — one command wires up the whole ecosystem.

Quick Start

Documentation site → single markdown file

from contaix.web import site_to_markdown

site_to_markdown(
    'https://fal.ai/docs/documentation',
    output_file='~/Downloads/fal_ai_docs.md',
    verbose=True,
)

site_to_markdown first probes the site for a publisher-provided /llms-full.txt bundle (Mintlify, Docusaurus, Fern, and others ship one). If found, it returns that directly — one HTTP request, clean markdown, no scraping. Otherwise it falls back to discovering navigation, fetching every page (with caching), and aggregating.

To bypass the fast path or grab the bundle yourself:

from contaix.web import find_llms_full_url, fetch_llms_full

bundle_url = find_llms_full_url('https://fal.ai/docs')   # str or None
markdown   = fetch_llms_full('https://fal.ai/docs')      # str or None

Code base → single markdown file

from contaix import code_aggregate

# A local package
md = code_aggregate('contaix')

# A directory
md = code_aggregate('/path/to/project')

# A GitHub repo (downloaded via hubcap)
md = code_aggregate('https://github.com/thorwhalen/contaix')

# Save instead of return
code_aggregate('contaix', egress='/tmp/contaix_code.md')

Files → markdown

from contaix import bytes_to_markdown, bytes_store_to_markdown_store
from dol import Files

# Single file (auto-detects format)
md = bytes_to_markdown(open('report.pdf', 'rb').read(), input_format='pdf')

# A whole folder
src = Files('/path/to/documents')
target = {}
bytes_store_to_markdown_store(src, target)

Markdown conversion is delegated to dn.src and supports PDF, DOCX, XLSX, PPTX, HTML, IPYNB, and more.

Cleanup of messy markdown (e.g. ChatGPT exports)

from contaix import remove_hyperlink_crap, remove_improperly_double_newlines

# Reads from clipboard if no arg, copies result back
clean = remove_hyperlink_crap()

Pure transforms live in dn.repair; the contaix wrappers add clipboard integration.

URLs in prose → article PDFs

from contaix import extract_urls, download_articles, verify_urls

# Inspect what URLs are in a string
list(extract_urls("[OpenAI](https://openai.com) and https://anthropic.com"))
# -> [('OpenAI', 'https://openai.com'), ('https://anthropic.com', 'https://anthropic.com')]

# Verify they resolve
verify_urls(['https://openai.com', 'https://anthropic.com'])
# -> {'https://openai.com': 200, 'https://anthropic.com': 200}

# Download as PDFs (delegates to pdfdol)
download_articles(some_md_string, save_dir='~/Downloads/articles')

Package Map

contaix/
  web.py           # site_to_markdown, find_llms_full_url, RSC extraction
  code.py          # code_aggregate, PackageCodeContexts, GitHub fetch
  markdown.py      # bytes_to_markdown (delegates to dn.src)
  urls.py          # extract_urls, verify_urls, download_articles
  aggregation.py   # aggregate_store with deduplication and chunking
  util.py          # markdown_of_site, scrape utilities, clipboard helpers
  skills.py        # discover and install ecosystem skills
  data/skills/     # ships SKILL.md files for AI agents

Architecture

  • Functions are the primary interface; classes are reserved for stateful workflows (e.g. PackageCodeContexts).
  • dol store abstractions for file/data access throughout.
  • Smart defaults with keyword-only configurability — out of the box for common cases, fully parameterizable when needed.
  • Pure transforms live in domain packages (dn, pdfdol); contaix is the orchestration and UX layer.

Optional dependencies

  • pdfdol — for download_articles and PDF round-trips (pip install contaix[pdf])
  • dn[all] — for the full set of file-format converters (pip install contaix[all])

Related packages

  • dn — pure markdown conversion and repair
  • pdfdol — PDF reading, writing, and format conversion
  • dol — store abstractions used everywhere
  • scraped — scraping primitives
  • hubcap — GitHub repo access

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contaix-0.0.13.tar.gz (188.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contaix-0.0.13-py3-none-any.whl (168.8 kB view details)

Uploaded Python 3

File details

Details for the file contaix-0.0.13.tar.gz.

File metadata

  • Download URL: contaix-0.0.13.tar.gz
  • Upload date:
  • Size: 188.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for contaix-0.0.13.tar.gz
Algorithm Hash digest
SHA256 4efa105261a16722fa4b4bc3d1737e153d50214800915eac348c690cfc6943c5
MD5 ba503dbd7185dd4c965cb4aba4a8e512
BLAKE2b-256 9a715e6a5a94594326df96ace22952ce189d090f2073352af3cbbc828975e435

See more details on using hashes here.

File details

Details for the file contaix-0.0.13-py3-none-any.whl.

File metadata

  • Download URL: contaix-0.0.13-py3-none-any.whl
  • Upload date:
  • Size: 168.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for contaix-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 087e88ab2d6c31dca6f6daacbe725f7183b2cf6d5913114b583924252f50b0fc
MD5 cc1b4f00c8c0cfa39e5ce9340b805a8c
BLAKE2b-256 098a85e425d8751a0e76d16d2a634c9250226de980078b59b0fe0883911bc3f6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page