A tool to crawl documentation sites and convert them to Markdown format
Project description
docs-to-markdown 🚀 | Convert online documentation into Markdown by simply providing a URL
Perfect for converting libraries, SDKs, and other documentation for use with LLMs and AI Agents.
What is docs-to-markdown?
docs-to-markdown is a simple and fast tool that crawls online documentation from a given URL and converts it into Markdown files. Just provide the URL of the documentation you want to convert, and the tool will handle the rest. Whether you need a single consolidated file or multiple files, it streamlines your workflow for feeding content into your LLM and optimizing your AI prompts.
Features ✨
- Flexible Conversion: Convert documentation into one file or split into multiple files based on your needs.
- Intelligent Filtering: Extract only the sections you need to include in your LLM's context.
- AI-Optimized: Tailored for seamless integration with LLMs, AI Agents, and other AI systems.
- User-Friendly: Easy-to-use commands and a straightforward interface for quick results.
Installation 💻
Install directly from PyPI:
pip install docs-to-markdown
Usage 🚀
The tool provides flexible options for converting online documentation to Markdown format:
Basic Usage
docs-to-markdown https://example.com/docs --doc_name example_docs
With LLM Filtering (using GPT-4)
docs-to-markdown https://example.com/docs --llm-filtering --doc_name example_docs
Note: When using --llm-filtering, you need to set your OpenAI API key via:
- Command line:
--openai-key "sk-..." - Environment variable:
OPENAI_API_KEY .envfile
Output Options
Generate multiple files (preserving site structure):
docs-to-markdown https://example.com/docs --doc_name example_docs --output multiple
Generate a single consolidated file:
docs-to-markdown https://example.com/docs --doc_name example_docs --output single
Additional Parameters
--max_depth: Maximum crawling depth (default: 2)--output_dir: Output directory (default: current directory)--llm-filtering: Use GPT-4 to filter and clean content--openai-key: OpenAI API key for LLM filtering
The tool will create a directory named by your doc_name parameter containing the Markdown files.
Development 🛠️
Developer Setup
Clone the project from GitHub:
git clone https://github.com/fdagostino/docs-to-markdown.git
cd docs-to-markdown
Install the required dependencies:
pip install -r requirements.txt
(Optional) It's recommended to use a virtual environment for development:
python -m venv .venv
source .venv/bin/activate # On macOS/Linux or `.venv\Scripts\activate` on Windows
To run and test the project locally, simply invoke:
python docs_to_markdown.py https://example.com/docs --doc_name example_docs
You can also use additional flags (e.g., --llm-filtering, --output multiple) as needed during development.
Developed 100% with AI 🤖
This project was entirely developed using 🤖🤖🤖 and the amazing library Crawl4AI.
Contributing & Reporting Issues 🤝
We welcome contributions and feedback to help improve docs-to-markdown.
How to Contribute
- Fork the Repository: Click the "Fork" button on GitHub to create your own copy.
- Clone Your Fork:
git clone https://github.com/YOUR_USERNAME/docs-to-markdown.git cd docs-to-markdown
- Create a Feature Branch:
git checkout -b feature/your-feature-name
- Implement Your Changes
- Submit a Pull Request: Open a pull request against the main repository when your changes are ready.
How to Report Issues
- Visit the Issues Page: Please report bugs or feature requests at https://github.com/fdagostino/docs-to-markdown/issues.
- Before Reporting: Check if the issue has already been reported.
- Provide Details: When opening a new issue, include a clear description, steps to reproduce (if applicable), and any relevant error messages.
Your contributions and feedback are highly appreciated!
Support Me ❤️
If you find this tool useful, please consider supporting me on ko‑fi:
My 🤖 makes 🔧🔧🔧 for 🫵. Help me buy some ⚡⚡⚡ to feed them!
License 📄
This project is licensed under the terms found in LICENSE.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docs_to_markdown-0.1.0.tar.gz.
File metadata
- Download URL: docs_to_markdown-0.1.0.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66815f50a8596dc8eb2ed0795e4228398200f95a39345ff5dcd04e340884dce8
|
|
| MD5 |
b4e67f495abd1bc240521180f4e336c0
|
|
| BLAKE2b-256 |
678251a3eacb65c9652ba1972b1da656092d2e93c1faf08d7c0c6708d771773b
|
File details
Details for the file docs_to_markdown-0.1.0-py3-none-any.whl.
File metadata
- Download URL: docs_to_markdown-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
377beacda05132714f7d12afd60f3b4feda1cce77072e835e265eb8dfb458d64
|
|
| MD5 |
7d7a72d5d26db03da7b4f0e83ab66151
|
|
| BLAKE2b-256 |
8653ea69e616193c7e5fa1608daf2aa8b43417e246da308a768f572131abc752
|