Skip to main content

A tool to crawl documentation sites and convert them to Markdown format

Project description

docs-to-markdown 🚀 | Convert online documentation into Markdown by simply providing a URL

Perfect for converting libraries, SDKs, and other documentation for use with LLMs and AI Agents.

What is docs-to-markdown?

docs-to-markdown is a simple and fast tool that crawls online documentation from a given URL and converts it into Markdown files. Just provide the URL of the documentation you want to convert, and the tool will handle the rest. Whether you need a single consolidated file or multiple files, it streamlines your workflow for feeding content into your LLM and optimizing your AI prompts.

Features ✨

  • Flexible Conversion: Convert documentation into one file or split into multiple files based on your needs.
  • Intelligent Filtering: Extract only the sections you need to include in your LLM's context.
  • AI-Optimized: Tailored for seamless integration with LLMs, AI Agents, and other AI systems.
  • User-Friendly: Easy-to-use commands and a straightforward interface for quick results.

Installation 💻

Install directly from PyPI:

pip install docs-to-markdown

Usage 🚀

The tool provides flexible options for converting online documentation to Markdown format:

Basic Usage

docs-to-markdown https://example.com/docs --doc_name example_docs

With LLM Filtering (using GPT-4)

docs-to-markdown https://example.com/docs --llm-filtering --doc_name example_docs

Note: When using --llm-filtering, you need to set your OpenAI API key via:

  • Command line: --openai-key "sk-..."
  • Environment variable: OPENAI_API_KEY
  • .env file

Output Options

Generate multiple files (preserving site structure):

docs-to-markdown https://example.com/docs --doc_name example_docs --output multiple

Generate a single consolidated file:

docs-to-markdown https://example.com/docs --doc_name example_docs --output single

Additional Parameters

  • --max_depth: Maximum crawling depth (default: 2)
  • --output_dir: Output directory (default: current directory)
  • --llm-filtering: Use GPT-4 to filter and clean content
  • --openai-key: OpenAI API key for LLM filtering

The tool will create a directory named by your doc_name parameter containing the Markdown files.

Development 🛠️

Developer Setup

Clone the project from GitHub:

git clone https://github.com/fdagostino/docs-to-markdown.git
cd docs-to-markdown

Install the required dependencies:

pip install -r requirements.txt

(Optional) It's recommended to use a virtual environment for development:

python -m venv .venv
source .venv/bin/activate  # On macOS/Linux or `.venv\Scripts\activate` on Windows

To run and test the project locally, simply invoke:

python docs_to_markdown.py https://example.com/docs --doc_name example_docs

You can also use additional flags (e.g., --llm-filtering, --output multiple) as needed during development.

Developed 100% with AI 🤖

This project was entirely developed using 🤖🤖🤖 and the amazing library Crawl4AI.

Contributing & Reporting Issues 🤝

We welcome contributions and feedback to help improve docs-to-markdown.

How to Contribute

  • Fork the Repository: Click the "Fork" button on GitHub to create your own copy.
  • Clone Your Fork:
    git clone https://github.com/YOUR_USERNAME/docs-to-markdown.git
    cd docs-to-markdown
    
  • Create a Feature Branch:
    git checkout -b feature/your-feature-name
    
  • Implement Your Changes
  • Submit a Pull Request: Open a pull request against the main repository when your changes are ready.

How to Report Issues

  • Visit the Issues Page: Please report bugs or feature requests at https://github.com/fdagostino/docs-to-markdown/issues.
  • Before Reporting: Check if the issue has already been reported.
  • Provide Details: When opening a new issue, include a clear description, steps to reproduce (if applicable), and any relevant error messages.

Your contributions and feedback are highly appreciated!

Support Me ❤️

If you find this tool useful, please consider supporting me on ko‑fi:
Donate

My 🤖 makes 🔧🔧🔧 for 🫵. Help me buy some ⚡⚡⚡ to feed them!

License 📄

This project is licensed under the terms found in LICENSE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docs_to_markdown-0.1.1.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docs_to_markdown-0.1.1-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file docs_to_markdown-0.1.1.tar.gz.

File metadata

  • Download URL: docs_to_markdown-0.1.1.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for docs_to_markdown-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5b8e1033937db7475bac84ed8fb751a7536ddbc6da806b0da08dda9db1cca1e8
MD5 3bade823201b8005334426fd142d6a19
BLAKE2b-256 8f27f92f49cf4f3e0c59fe550c5016fd94279f3eb3ab51e4bfec1ef555c23923

See more details on using hashes here.

File details

Details for the file docs_to_markdown-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for docs_to_markdown-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c1875325ae08afbe68c56c5ff9d3483123a4c8537cc283d9893de2f84fbaf8b2
MD5 5f8ba6a99052addc8c84055dd243a6a6
BLAKE2b-256 639b08de7a6aeccdbdbac95b1291b0054adfee32d19d4cb54373787f8c9cb4b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page