Skip to main content

Tools to make contexts for AI

Project description

contaix

Tools to make contexts (knowledge bases) for AI.

contaix turns diverse sources — code repos, documentation sites, file collections — into clean markdown contexts for AI agents to consume.

pip install contaix

Claude Code Skills

contaix ships with Claude Code skills so an AI agent can use it directly. After installing, run:

python -m contaix.skills

This symlinks each skill into ~/.claude/skills/, where Claude Code discovers them automatically.

Skill When the agent uses it
contaix_docs_to_markdown "Get me the docs", "download documentation", "make a markdown of this docs site"
contaix_web_aggregate "Aggregate these pages into markdown" — falls back to manual curation when sites need it
dn-repair (from dn) "Clean up this markdown", "fix broken links"
pdfdol-read (from pdfdol) "Extract text from these PDFs"
pdfdol-convert (from pdfdol) "Convert this to PDF"

python -m contaix.skills also picks up skills shipped by the related packages (dn, pdfdol) when those are installed — one command wires up the whole ecosystem.

Quick Start

Documentation site → single markdown file

from contaix.web import site_to_markdown

site_to_markdown(
    'https://fal.ai/docs/documentation',
    output_file='~/Downloads/fal_ai_docs.md',
    verbose=True,
)

site_to_markdown first probes the site for a publisher-provided /llms-full.txt bundle (Mintlify, Docusaurus, Fern, and others ship one). If found, it returns that directly — one HTTP request, clean markdown, no scraping. Otherwise it falls back to discovering navigation, fetching every page (with caching), and aggregating.

To bypass the fast path or grab the bundle yourself:

from contaix.web import find_llms_full_url, fetch_llms_full

bundle_url = find_llms_full_url('https://fal.ai/docs')   # str or None
markdown   = fetch_llms_full('https://fal.ai/docs')      # str or None

Code base → single markdown file

from contaix import code_aggregate

# A local package
md = code_aggregate('contaix')

# A directory
md = code_aggregate('/path/to/project')

# A GitHub repo (downloaded via hubcap)
md = code_aggregate('https://github.com/thorwhalen/contaix')

# Save instead of return
code_aggregate('contaix', egress='/tmp/contaix_code.md')

Files → markdown

from contaix import bytes_to_markdown, bytes_store_to_markdown_store
from dol import Files

# Single file (auto-detects format)
md = bytes_to_markdown(open('report.pdf', 'rb').read(), input_format='pdf')

# A whole folder
src = Files('/path/to/documents')
target = {}
bytes_store_to_markdown_store(src, target)

Markdown conversion is delegated to dn.src and supports PDF, DOCX, XLSX, PPTX, HTML, IPYNB, and more.

Cleanup of messy markdown (e.g. ChatGPT exports)

from contaix import remove_hyperlink_crap, remove_improperly_double_newlines

# Reads from clipboard if no arg, copies result back
clean = remove_hyperlink_crap()

Pure transforms live in dn.repair; the contaix wrappers add clipboard integration.

URLs in prose → article PDFs

from contaix import extract_urls, download_articles, verify_urls

# Inspect what URLs are in a string
list(extract_urls("[OpenAI](https://openai.com) and https://anthropic.com"))
# -> [('OpenAI', 'https://openai.com'), ('https://anthropic.com', 'https://anthropic.com')]

# Verify they resolve
verify_urls(['https://openai.com', 'https://anthropic.com'])
# -> {'https://openai.com': 200, 'https://anthropic.com': 200}

# Download as PDFs (delegates to pdfdol)
download_articles(some_md_string, save_dir='~/Downloads/articles')

Package Map

contaix/
  web.py           # site_to_markdown, find_llms_full_url, RSC extraction
  code.py          # code_aggregate, PackageCodeContexts, GitHub fetch
  markdown.py      # bytes_to_markdown (delegates to dn.src)
  urls.py          # extract_urls, verify_urls, download_articles
  aggregation.py   # aggregate_store with deduplication and chunking
  util.py          # markdown_of_site, scrape utilities, clipboard helpers
  skills.py        # discover and install ecosystem skills
  data/skills/     # ships SKILL.md files for AI agents

Architecture

  • Functions are the primary interface; classes are reserved for stateful workflows (e.g. PackageCodeContexts).
  • dol store abstractions for file/data access throughout.
  • Smart defaults with keyword-only configurability — out of the box for common cases, fully parameterizable when needed.
  • Pure transforms live in domain packages (dn, pdfdol); contaix is the orchestration and UX layer.

Optional dependencies

  • pdfdol — for download_articles and PDF round-trips (pip install contaix[pdf])
  • dn[all] — for the full set of file-format converters (pip install contaix[all])

Related packages

  • dn — pure markdown conversion and repair
  • pdfdol — PDF reading, writing, and format conversion
  • dol — store abstractions used everywhere
  • scraped — scraping primitives
  • hubcap — GitHub repo access

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contaix-0.0.16.tar.gz (193.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contaix-0.0.16-py3-none-any.whl (168.8 kB view details)

Uploaded Python 3

File details

Details for the file contaix-0.0.16.tar.gz.

File metadata

  • Download URL: contaix-0.0.16.tar.gz
  • Upload date:
  • Size: 193.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for contaix-0.0.16.tar.gz
Algorithm Hash digest
SHA256 c154335580eb0970ff9f214a24869587d3f71a2e0f938f36d27ddabc508c19d5
MD5 cc1a9e8cca84eccd52da56dec99ee38d
BLAKE2b-256 3a7272e44016b3b2d4ca3a2ee1d1faadf2d0ee56bba0fa8d3b00f09e70f6b52c

See more details on using hashes here.

File details

Details for the file contaix-0.0.16-py3-none-any.whl.

File metadata

  • Download URL: contaix-0.0.16-py3-none-any.whl
  • Upload date:
  • Size: 168.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for contaix-0.0.16-py3-none-any.whl
Algorithm Hash digest
SHA256 8c83688130550152c9ca4559836a86af7cb08dad32cb145e0398a39103aee411
MD5 508b14ce3b3bde190b3a140ca9d77de3
BLAKE2b-256 fa7554cc6e130ab17ebf43b4de6a4c75b8fcc315ad175bce54a6b0d0e80e9b58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page