Skip to main content

Zero-Click AEO toolkit: crawl -> ingest -> index -> search, composing markitdown + turbovec + headroom.

Project description

aeo-kit

Zero-Click AEO toolkit — turn any document, file, or website into LLM-readable Markdown + an llms.txt outline, then index and search it locally. A thin, clean composition of best-in-class open-source parts; it does not reinvent conversion, compression, or vector indexing.

crawl  → ingest → index → search
(site)   (md +    (turbovec) (top-k)
          llms.txt)

Why

"AI Engine Optimization" (AEO/GEO) starts with getting messy real-world content into a form a model can use. aeo-kit composes:

  • markitdown (MIT) — PDF/DOCX/PPTX/XLSX/HTML/CSV → Markdown
  • headroom-ai (Apache-2.0, optional) — compress verbose tool/RAG context 60–95%
  • turbovec — air-gapped quantized vector index
  • a tiny zero-dependency crawler (no AGPL, no hosted API key)

Everything runs locally. No API keys required.

Install

pip install aeo-kit            # core
pip install "aeo-kit[compress]"  # + headroom-ai for context compression

CLI

# 1) crawl a site (local interlinked html or a live URL) -> markdown
aeo-crawl ./site/index.html --out build/site
aeo-crawl https://example.com --max-pages 5 --out build/site

# 2) ingest any file / folder / URL -> per-doc md + site-level llms.txt
aeo-ingest ./company_docs --out build/aeo --compress

# 3) local retrieval over the ingested docs (TF-IDF -> turbovec)
aeo-search build/site "consumption tax filing" --k 3

Library

from adapters import markitdown_aeo as mk
conv = mk.convert("page.html")
print(mk.aeo_extract(conv))     # llms.txt seed from the heading structure

Notes

  • Crawler politeness: HTTP mode is same-domain only, respects robots.txt, rate-limits, and is bounded by --max-pages / --max-depth.
  • Search scope: aeo-search uses TF-IDF (lexical) retrieval over turbovec — no embedding model / network required. Swap in a sentence-transformer for semantic search; the turbovec layer is unchanged.
  • Compression: headroom-ai protects user messages and compresses tool/log/RAG content; clean short prose may compress little, by design.

Develop

pip install -e .
python experiment.py   # end-to-end real run -> poc/out/
python audit.py        # deterministic checks (exit 0 = all pass)

License

MIT (this toolkit). Bundled dependencies are installed separately and retain their own licenses — see THIRD_PARTY_NOTICES.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aeo_kit-0.1.0.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aeo_kit-0.1.0-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file aeo_kit-0.1.0.tar.gz.

File metadata

  • Download URL: aeo_kit-0.1.0.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for aeo_kit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dde6691ae15a6fb9573818057856c3ee8541cf27a76ca3dc1b44efe4a04c03e4
MD5 0cbf023afe97cf21148b841af697482c
BLAKE2b-256 0e56675ac32c15a90f73c484403b0d537a862392039e0ecdd57f8fe008fc179d

See more details on using hashes here.

File details

Details for the file aeo_kit-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: aeo_kit-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for aeo_kit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6f19dd75a174b984f6f9a6a5ee416e7a685f3bbf544f1f046495506f343c9345
MD5 f2b48ea89319ad54051e54acb05d8625
BLAKE2b-256 e96f394a3d6944c81146caf5116279e0c16875e007b90f616894d2d3cbb5e5e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page