The easiest way to crawl a website and produce LLM ready markdown files
Project description
url2llm
I needed a super simple tool to crawl a website (or the links in a llms.txt) into a formatted markdown file (without headers, navigation etc.) to add to Claude or ChatGPT project documents.
I haven't found an easy solution, there is some web based tool with a few free credits, but if you are already paying for some LLM with an api, why pay also someone else?
Quickstart
With uv (recommended):
Thanks to uv, you can easily run it from anywhere without installing anything:
uv run \
--with url2llm \
url2llm \
--depth 1 \
--url "https://modelcontextprotocol.io/llms.txt" \
--instruction "I need documents related to developing MCP (model context protocol) servers" \
--provider "gemini/gemini-2.5-flash-preview-04-17" \
--api_key ${GEMINI_API_KEY} \
--output-dir ~/Desktop/
Then drag ~/Desktop/model-context-protocol-documentation.md into ChatGPT/Claude!
With pip (alternative):
pip install url2llm
What it does
The script uses Crawl4AI:
- For each url in the crawling, the script produces a markdown
- Then it asks the LLM to extract from each page only the content relevant to the given instruction.
- Merge all pages into one and save the merged file.
Command args and hints
- To use another LLM provider, just change
--providerto eg.openai/gpt-4o- always set
--api-key, it is not always inferred correctly fron env vars
- always set
- Provide a clear goal to
--instruction. This will guide the LLM to filter out irrelevant pages. - Recommended depth (default =
2):2or1for normal website1for llms.txt
- If you need the single pages, use
--keep_pages true - You can specify the concurrency with
--concurrency(default = 16) - The scripts deletes files shorter than
--min_chars(Default = 1000)
[!CAUTION] If you need to do more complex stuff use Crawl4AI directly and build it yourself: https://docs.crawl4ai.com/
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file url2llm-0.2.0.tar.gz.
File metadata
- Download URL: url2llm-0.2.0.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2bab5ed07f7897ac0eb021bc4a30aa976f3dc9f764bcb82b981db306d94a7b5
|
|
| MD5 |
0797ab31a03b31a8f5713d4a714d1efd
|
|
| BLAKE2b-256 |
e3f59be66d3b078355b262c1a920476988455acc700345993a941d075f476c5c
|
File details
Details for the file url2llm-0.2.0-py3-none-any.whl.
File metadata
- Download URL: url2llm-0.2.0-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab36b4d8beceadda9783a877c5b5aac4463f13bbfeba2cc2e071545095b50886
|
|
| MD5 |
d4c6cd66cfa7dac44287c68f0f07e1c1
|
|
| BLAKE2b-256 |
26b04f792c36b3f87ead1d75f2b6378a08f35da51682a0d08061fe71fd68412b
|