Tools to make contexts for AI
Project description
contaix
Tools to make contexts (knowledge bases) for AI.
contaix turns diverse sources — code repos, documentation sites, file collections — into clean markdown contexts for AI agents to consume.
pip install contaix
Claude Code Skills
contaix ships with Claude Code skills so an AI agent can use it directly. After installing, run:
python -m contaix.skills
This symlinks each skill into ~/.claude/skills/, where Claude Code discovers
them automatically.
| Skill | When the agent uses it |
|---|---|
contaix_docs_to_markdown |
"Get me the docs", "download documentation", "make a markdown of this docs site" |
contaix_web_aggregate |
"Aggregate these pages into markdown" — falls back to manual curation when sites need it |
dn-repair (from dn) |
"Clean up this markdown", "fix broken links" |
pdfdol-read (from pdfdol) |
"Extract text from these PDFs" |
pdfdol-convert (from pdfdol) |
"Convert this to PDF" |
python -m contaix.skills also picks up skills shipped by the related
packages (dn, pdfdol) when those are installed — one command wires up
the whole ecosystem.
Quick Start
Documentation site → single markdown file
from contaix.web import site_to_markdown
site_to_markdown(
'https://fal.ai/docs/documentation',
output_file='~/Downloads/fal_ai_docs.md',
verbose=True,
)
site_to_markdown first probes the site for a publisher-provided
/llms-full.txt bundle (Mintlify, Docusaurus, Fern, and others ship one).
If found, it returns that directly — one HTTP request, clean markdown, no
scraping. Otherwise it falls back to discovering navigation, fetching every
page (with caching), and aggregating.
To bypass the fast path or grab the bundle yourself:
from contaix.web import find_llms_full_url, fetch_llms_full
bundle_url = find_llms_full_url('https://fal.ai/docs') # str or None
markdown = fetch_llms_full('https://fal.ai/docs') # str or None
Code base → single markdown file
from contaix import code_aggregate
# A local package
md = code_aggregate('contaix')
# A directory
md = code_aggregate('/path/to/project')
# A GitHub repo (downloaded via hubcap)
md = code_aggregate('https://github.com/thorwhalen/contaix')
# Save instead of return
code_aggregate('contaix', egress='/tmp/contaix_code.md')
Files → markdown
from contaix import bytes_to_markdown, bytes_store_to_markdown_store
from dol import Files
# Single file (auto-detects format)
md = bytes_to_markdown(open('report.pdf', 'rb').read(), input_format='pdf')
# A whole folder
src = Files('/path/to/documents')
target = {}
bytes_store_to_markdown_store(src, target)
Markdown conversion is delegated to dn.src
and supports PDF, DOCX, XLSX, PPTX, HTML, IPYNB, and more.
Cleanup of messy markdown (e.g. ChatGPT exports)
from contaix import remove_hyperlink_crap, remove_improperly_double_newlines
# Reads from clipboard if no arg, copies result back
clean = remove_hyperlink_crap()
Pure transforms live in dn.repair;
the contaix wrappers add clipboard integration.
URLs in prose → article PDFs
from contaix import extract_urls, download_articles, verify_urls
# Inspect what URLs are in a string
list(extract_urls("[OpenAI](https://openai.com) and https://anthropic.com"))
# -> [('OpenAI', 'https://openai.com'), ('https://anthropic.com', 'https://anthropic.com')]
# Verify they resolve
verify_urls(['https://openai.com', 'https://anthropic.com'])
# -> {'https://openai.com': 200, 'https://anthropic.com': 200}
# Download as PDFs (delegates to pdfdol)
download_articles(some_md_string, save_dir='~/Downloads/articles')
Package Map
contaix/
web.py # site_to_markdown, find_llms_full_url, RSC extraction
code.py # code_aggregate, PackageCodeContexts, GitHub fetch
markdown.py # bytes_to_markdown (delegates to dn.src)
urls.py # extract_urls, verify_urls, download_articles
aggregation.py # aggregate_store with deduplication and chunking
util.py # markdown_of_site, scrape utilities, clipboard helpers
skills.py # discover and install ecosystem skills
data/skills/ # ships SKILL.md files for AI agents
Architecture
- Functions are the primary interface; classes are reserved for stateful
workflows (e.g.
PackageCodeContexts). dolstore abstractions for file/data access throughout.- Smart defaults with keyword-only configurability — out of the box for common cases, fully parameterizable when needed.
- Pure transforms live in domain packages (
dn,pdfdol); contaix is the orchestration and UX layer.
Optional dependencies
pdfdol— fordownload_articlesand PDF round-trips (pip install contaix[pdf])dn[all]— for the full set of file-format converters (pip install contaix[all])
Related packages
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contaix-0.0.14.tar.gz.
File metadata
- Download URL: contaix-0.0.14.tar.gz
- Upload date:
- Size: 188.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84d34e03314f6f5adf7c7fdb57bd6bbf4ea5900b6d48fade1b2412145a7ef207
|
|
| MD5 |
ff91cf7fa90ef4fa1e26be1efba6c506
|
|
| BLAKE2b-256 |
458f75cfcbbea13729d968d0b2d32e7ce684683e8007cb9827c8072df52d239e
|
File details
Details for the file contaix-0.0.14-py3-none-any.whl.
File metadata
- Download URL: contaix-0.0.14-py3-none-any.whl
- Upload date:
- Size: 168.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0aa55c4c0612ca7b0f6e10f93ea6e1923e6942088f2d9a6f5bab74e1a029557f
|
|
| MD5 |
ce62a0ecaedd2067d614ee4aed611dc8
|
|
| BLAKE2b-256 |
64c8e1296ae88998e4365a4e7dca087323b9303f2370808d4b2585d198ddf37a
|