Skip to main content

A tool to crawl documentation sites and convert them to Markdown format

Project description

docs-to-markdown 🚀 | Convert online documentation into Markdown by simply providing a URL

Perfect for converting libraries, SDKs, and other documentation for use with LLMs and AI Agents.

What is docs-to-markdown?

docs-to-markdown is a simple and fast tool that crawls online documentation from a given URL and converts it into Markdown files. Just provide the URL of the documentation you want to convert, and the tool will handle the rest. Whether you need a single consolidated file or multiple files, it streamlines your workflow for feeding content into your LLM and optimizing your AI prompts.

Features ✨

  • Flexible Conversion: Convert documentation into one file or split into multiple files based on your needs.
  • Intelligent Filtering: Extract only the sections you need to include in your LLM's context.
  • AI-Optimized: Tailored for seamless integration with LLMs, AI Agents, and other AI systems.
  • User-Friendly: Easy-to-use commands and a straightforward interface for quick results.

Installation 💻

Install directly from PyPI:

pip install docs-to-markdown

Usage 🚀

The tool provides flexible options for converting online documentation to Markdown format:

Basic Usage

docs-to-markdown https://example.com/docs --doc_name example_docs

With LLM Filtering (using GPT-4)

docs-to-markdown https://example.com/docs --llm-filtering --doc_name example_docs

Note: When using --llm-filtering, you need to set your OpenAI API key via:

  • Command line: --openai-key "sk-..."
  • Environment variable: OPENAI_API_KEY
  • .env file

Output Options

Generate multiple files (preserving site structure):

docs-to-markdown https://example.com/docs --doc_name example_docs --output multiple

Generate a single consolidated file:

docs-to-markdown https://example.com/docs --doc_name example_docs --output single

Additional Parameters

  • --max_depth: Maximum crawling depth (default: 2)
  • --output_dir: Output directory (default: current directory)
  • --llm-filtering: Use GPT-4 to filter and clean content
  • --openai-key: OpenAI API key for LLM filtering

The tool will create a directory named by your doc_name parameter containing the Markdown files.

Development 🛠️

Developer Setup

Clone the project from GitHub:

git clone https://github.com/fdagostino/docs-to-markdown.git
cd docs-to-markdown

Install the required dependencies:

pip install -r requirements.txt

(Optional) It's recommended to use a virtual environment for development:

python -m venv .venv
source .venv/bin/activate  # On macOS/Linux or `.venv\Scripts\activate` on Windows

To run and test the project locally, simply invoke:

python docs_to_markdown.py https://example.com/docs --doc_name example_docs

You can also use additional flags (e.g., --llm-filtering, --output multiple) as needed during development.

Developed 100% with AI 🤖

This project was entirely developed using 🤖🤖🤖 and the amazing library Crawl4AI.

Contributing & Reporting Issues 🤝

We welcome contributions and feedback to help improve docs-to-markdown.

How to Contribute

  • Fork the Repository: Click the "Fork" button on GitHub to create your own copy.
  • Clone Your Fork:
    git clone https://github.com/YOUR_USERNAME/docs-to-markdown.git
    cd docs-to-markdown
    
  • Create a Feature Branch:
    git checkout -b feature/your-feature-name
    
  • Implement Your Changes
  • Submit a Pull Request: Open a pull request against the main repository when your changes are ready.

How to Report Issues

  • Visit the Issues Page: Please report bugs or feature requests at https://github.com/fdagostino/docs-to-markdown/issues.
  • Before Reporting: Check if the issue has already been reported.
  • Provide Details: When opening a new issue, include a clear description, steps to reproduce (if applicable), and any relevant error messages.

Your contributions and feedback are highly appreciated!

Support Me ❤️

If you find this tool useful, please consider supporting me on ko‑fi:
Donate

My 🤖 makes 🔧🔧🔧 for 🫵. Help me buy some ⚡⚡⚡ to feed them!

License 📄

This project is licensed under the terms found in LICENSE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docs_to_markdown-0.1.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docs_to_markdown-0.1.0-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file docs_to_markdown-0.1.0.tar.gz.

File metadata

  • Download URL: docs_to_markdown-0.1.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for docs_to_markdown-0.1.0.tar.gz
Algorithm Hash digest
SHA256 66815f50a8596dc8eb2ed0795e4228398200f95a39345ff5dcd04e340884dce8
MD5 b4e67f495abd1bc240521180f4e336c0
BLAKE2b-256 678251a3eacb65c9652ba1972b1da656092d2e93c1faf08d7c0c6708d771773b

See more details on using hashes here.

File details

Details for the file docs_to_markdown-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for docs_to_markdown-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 377beacda05132714f7d12afd60f3b4feda1cce77072e835e265eb8dfb458d64
MD5 7d7a72d5d26db03da7b4f0e83ab66151
BLAKE2b-256 8653ea69e616193c7e5fa1608daf2aa8b43417e246da308a768f572131abc752

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page