The easiest way to crawl a website and produce LLM ready markdown files

Project description

url2llm

I needed a super simple tool to crawl a website (or the links in a llms.txt) into formatted markdown files (without headers, navigation etc.) to add to Claude or ChatGPT project documents.

I haven't found an easy solution, there is some web based tool with a few free credits, but if you are already paying for some LLM with an api, why pay also someone else?

What it does

The script uses Crawl4AI:

For each url in the crawling, the script produces a markdown
Then it asks the LLM to extract only the content relevant to the given instruction and save all files to disk.
Merge all files into one and save the merged file.

Installation

Clone the repo, then:
- (Recommended, with uv) – Nothing to do
- (Alternative, with pip) – Install crawl4ai and fire

How to use

Run script with arguments:

uv run main.py \
   --url "<URL_OR_LLMS.TXT>" \
   --depth 1 \
   --instruction "I need documents related to <GOAL>" \
   --provider "<PROVIDER>/<MODELNAME>" \
   --api-key ${GEMINI_API_KEY} \
   --output-dir "<OUTPUT_DIR>"

To use another LLM provider, just change --provider to eg. openai/gpt-4o
- always set --api-key, it is not always inferred correctly fron env vars
Provide a clear goal to --instruction. This will guide the LLM to filter out irrelevant pages.
Recommended depth (default = 2):
- 2 or 1 for normal website
- 1 for llms.txt
You can specify the concurrency with --concurrency (default = 16)
The scripts deletes files shorter than --min_chars (Default = 1000)

[!CAUTION] If you need to do more complex stuff use Crawl4AI directly and build it yourself: https://docs.crawl4ai.com/

How I use it

Thanks to uv, I can easily run it from anywhere in my system:

uv \
   --directory ~/Dev/url2llm/ \
   run main.py \
   --url "https://modelcontextprotocol.io/llms.txt" \
   --instruction "I need documents related to developing MCP (model context protocol) servers" \
   --provider "gemini/gemini-2.5-flash-preview-04-17" \
   --api_key ${GEMINI_API_KEY} \
   --output-dir ~/Desktop/crawl_out/

And drag ~/Desktop/crawl_out/merged/model-context-protocol-documentation.md into ChatGPT/Claude!

locally

uv pip install .

publish

uv run pip install --upgrade twine

twine upload dist/*

Project details

Release history Release notifications | RSS feed

0.3.3

Jul 5, 2025

0.3.2

May 17, 2025

0.3.1

Apr 26, 2025

0.3.0

Apr 26, 2025

0.2.1

Apr 26, 2025

0.2.0

Apr 26, 2025

This version

0.1.0

Apr 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

url2llm-0.1.0.tar.gz (6.2 kB view details)

Uploaded Apr 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

url2llm-0.1.0-py3-none-any.whl (6.7 kB view details)

Uploaded Apr 26, 2025 Python 3

File details

Details for the file url2llm-0.1.0.tar.gz.

File metadata

Download URL: url2llm-0.1.0.tar.gz
Upload date: Apr 26, 2025
Size: 6.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for url2llm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`41c338b541a3b743d7ee3c54ddca88b92995219b998e07ee33fdc627bfde06f5`
MD5	`d1ccacbe7ab40919d5b13f7541ff9f86`
BLAKE2b-256	`2e9f5bbb6bb4454886f490d06ede82adaa6c40c9511f676456bdeb96abc20b71`

See more details on using hashes here.

File details

Details for the file url2llm-0.1.0-py3-none-any.whl.

File metadata

Download URL: url2llm-0.1.0-py3-none-any.whl
Upload date: Apr 26, 2025
Size: 6.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for url2llm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c97345e38ffa57de672d30c2200a53b23bbcd5ff42df79f2d05358c69a4f59ec`
MD5	`6ba40ab7ff9a6242e5e2a1c69014ae41`
BLAKE2b-256	`0217f56e483300c894643875eff3b102fb9f485874c6bc94cfe153e85ee01b72`

See more details on using hashes here.

url2llm 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

url2llm

What it does

Installation

How to use

Run script with arguments:

How I use it

locally

publish

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes