Skip to main content

Extensible Next.js/DeepWiki content extractor with zero external dependencies

Project description

deepwiki-to-md

English README. 日本語はこちら → README_JP.md

Zero-dependency CLI and Python library to extract Markdown from Next.js/DeepWiki HTML. Includes a small search helper for public repository indexes and an optional chat helper.

  • CLI: deepwiki-to-md
  • Requirements: Python 3.8+
  • Dependencies: Standard library only (optional extras for dev/docs)

Install

pip install deepwiki-to-md

Usage

  • From local HTML/string (CLI and Python):
# CLI
echo "<html>...</html>" | deepwiki-to-md
# Python API
from deepwiki_to_md import ContentExtractor

html = """
<!doctype html>
<html>...</html>
"""

extractor = ContentExtractor()
md = extractor.extract_from_html(html)
print(md)
  • From URL (files are saved only when the input is a URL):
# CLI
# Files under .deepwiki are created only for URL input
deepwiki-to-md https://deepwiki.com/microsoft/vscode/some-page --path ./.deepwiki
# Python API (same behavior as the CLI)
from deepwiki_to_md import ContentExtractor, save_markdown_to_library

url = "https://deepwiki.com/microsoft/vscode/some-page"
base_dir = "./.deepwiki"  # equivalent to --path (optional)

extractor = ContentExtractor()
md = extractor.extract_from_url(url)

result = save_markdown_to_library(md, url, base_dir)
print("saved files:")
for p in result["saved_files"]:
    print(" -", p)
print("library index:", result["library_file"])  # .deepwiki/<username>/<library>.md
  • Search public repository indexes:
# CLI (JSON by default)
deepwiki-to-md --search "Gemini"

# Human-readable development-log style
deepwiki-to-md --search "Gemini" --devlog
# Python API (same search capability)
from search_repository import search_repositories, API_URL

print(API_URL)  # => https://api.devin.ai/ada/list_public_indexes
result = search_repositories("Gemini")
indices = result.get("indices", [])
print("indices:", len(indices))
  • Chat with Devin API (via CLI):
# Positional argument must be a DeepWiki URL
# JSON output by default
deepwiki-to-md https://deepwiki.com/microsoft/vscode --chat "What is the purpose of this repository?"

# Human-readable output for development logs
deepwiki-to-md https://deepwiki.com/microsoft/vscode --chat "Summarize top features" --devlog

Options for chat via deepwiki-to-md:

  • --chat MESSAGE: Message to send. Requires a DeepWiki URL as the positional input.
  • --deep-research: Enable deep research mode for chat.
  • --config-file PATH: Path to chat config JSON (default: ./config.json). The file must exist and contain complete settings.
  • --devlog: When used with --chat, prints a human-readable response body and reference files.

License

MIT License

More documentation

Chat (Devin API) result object: ChatResult

The chat helper (src/chat.py) returns a ChatResult object instead of a plain dict.

  • Highlights

    • Inherits from dict → works with json.dumps(result) directly.
    • Convenient attribute access (e.g., result.response_message) and to_dict().
    • print(result) shows a human-readable summary.
  • Main properties

    • sent_message: str
    • response_message: Optional[str]
    • status_code: Any
    • reference_files: List[str]
    • reference_file_contents: Dict[str, str]
  • Example (excerpt)

import asyncio
import json
from chat import load_or_create_config, send_chat_message, ChatResult

async def main() -> None:

    result: ChatResult = await send_chat_message(
        wiki_url='https://deepwiki.com/microsoft/vscode',
        message='What is the purpose of this repository?',
        use_deep_research=False,
    )

    print(result)  # human-readable summary via __str__
    print(result.response_message)  # attribute access
    print(json.dumps(result, indent=2, ensure_ascii=False))  # still a dict

if __name__ == '__main__':
    asyncio.run(main())

Arguments for chat.py:

  • --url: URL of the chat interface.
  • --message: Message to send.
  • --selector: CSS selector for the chat input (default: textarea).
  • --button: CSS selector for the submit button (default: button).
  • --wait: Time to wait for response in seconds (default: 30).
  • --debug: Enable debug mode.
  • --output: Output directory (default: ChatResponses).
  • --deep: Enable "Deep Research" mode (specific to some interfaces).
  • --headless: Run browser in headless mode.
  • --format: Output format(s): html, md, yaml, or comma-separated list (default: html).

Note: The chat scraper uses Selenium, which requires a compatible browser installed.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepwiki_to_md-2.0.3.tar.gz (26.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deepwiki_to_md-2.0.3-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file deepwiki_to_md-2.0.3.tar.gz.

File metadata

  • Download URL: deepwiki_to_md-2.0.3.tar.gz
  • Upload date:
  • Size: 26.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for deepwiki_to_md-2.0.3.tar.gz
Algorithm Hash digest
SHA256 28a8b1e018e25db30b40bda27a08eb1a1999695c0fb596b9c7d396873a856235
MD5 d5369fe8188da8b51333f6dfc756d497
BLAKE2b-256 3a8ca046247d039a8e467ad56ab7e294f53d1188535aed3a1b7953349ec93a89

See more details on using hashes here.

File details

Details for the file deepwiki_to_md-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: deepwiki_to_md-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 25.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for deepwiki_to_md-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 58efeea773474eda2ca020d0663ee6a67c269bcf40b6b8c7879e3aaf3c3adcf1
MD5 c77ea4bd439dd1e190e7986a4081595a
BLAKE2b-256 b34855bee017238cbc496f31b89df2e7e69d353a60b0f555de63d479b2efcea2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page