Crawl API documentation (OpenAPI, Swagger, ReadMe, Mintlify, Fern, llms.txt, plain HTML) into structured, searchable markdown

These details have not been verified by PyPI

Project links

Homepage

Project description

ApiCrawl

Crawl API documentation into structured, searchable markdown.

Point it at any API docs URL — an OpenAPI/Swagger spec, a Swagger UI / Redoc / Stoplight / Scalar page, ReadMe / Mintlify / Fern hosted docs, a Postman collection, a Google Discovery document, an llms.txt index, or plain HTML docs — and it discovers the underlying spec where one exists, crawls the pages where one doesn't, classifies the content with an LLM, extracts authentication instructions, and writes everything as a local markdown tree you can grep, read, or feed to any tool.

pip install apicrawl
playwright install chromium   # used to render JS-heavy docs sites

export GOOGLE_API_KEY=...     # LLM access is required (Gemini primary)
export GROQ_API_KEY=...       # optional fallback provider

apicrawl https://petstore3.swagger.io --output ./api-docs

Output layout:

api-docs/<catalog_id>/
  index.md            # API name, metadata, description, auth instructions
  manifest.json       # listing of everything ingested (also the completion marker)
  sections/<slug>.md  # docs pages / spec tag groups (markdown + frontmatter)
  endpoints/<slug>.md # one file per endpoint: parameters, examples, TypeScript types

Library usage:

import asyncio
from apicrawl import ingest_to_dir

result = asyncio.run(ingest_to_dir("https://docs.example.com/api", "./api-docs"))
print(result.entry.name, result.pages_ingested)

Custom storage — implement IngestionSink and receive the parsed catalog entry, sections, and endpoints as plain pydantic models, streamed in batches:

from apicrawl import IngestionSink, ingest

class MySink(IngestionSink):
    async def emit_sections(self, sections): ...
    async def emit_endpoints(self, endpoints): ...

asyncio.run(ingest("https://docs.example.com/api", MySink()))

Notes:

LLM keys are required — page classification and auth extraction are LLM-powered. Set GOOGLE_API_KEY (and optionally GROQ_API_KEY) in the environment or a .env file.
Node.js is optional — if a node binary is on PATH, endpoint pages include generated TypeScript request/response types (via a bundled openapi-typescript). Without Node, ingestion still works; the TS sections are simply omitted.

License: Apache-2.0

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0

Jun 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apicrawl-0.1.0.tar.gz (2.3 MB view details)

Uploaded Jun 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

apicrawl-0.1.0-py3-none-any.whl (2.4 MB view details)

Uploaded Jun 12, 2026 Python 3

File details

Details for the file apicrawl-0.1.0.tar.gz.

File metadata

Download URL: apicrawl-0.1.0.tar.gz
Upload date: Jun 12, 2026
Size: 2.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.13.3 Darwin/24.6.0

File hashes

Hashes for apicrawl-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5ba91ddb6ca4f351019ea489af9a1ce343bda2e7918c99452f149eec21e5d9a8`
MD5	`df3f01070b6c29b98364d33d99404312`
BLAKE2b-256	`ba3d2ecb929da1b02ed9fea1ff467294a43f449446884365792ea43edcc74877`

See more details on using hashes here.

File details

Details for the file apicrawl-0.1.0-py3-none-any.whl.

File metadata

Download URL: apicrawl-0.1.0-py3-none-any.whl
Upload date: Jun 12, 2026
Size: 2.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.13.3 Darwin/24.6.0

File hashes

Hashes for apicrawl-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7526409910e5fab5c07a26f536f4b3a314496a64a39aa662500eef63fbf3ce97`
MD5	`8f65a8da94702f03bef10faa1294d93c`
BLAKE2b-256	`56bd4c765bebf2209c54547737e2a2a1295f898680b1c3617bc50e8f4e6d7f4b`

See more details on using hashes here.

apicrawl 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ApiCrawl

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes