Extensible Next.js/DeepWiki content extractor with zero external dependencies
Project description
deepwiki-to-md
English README. 日本語はこちら → README_JP.md
Zero-dependency CLI and Python library to extract Markdown from Next.js/DeepWiki HTML. Includes a small search helper for public repository indexes and an optional chat helper.
- CLI:
deepwiki-to-md - Requirements: Python 3.8+
- Dependencies: Standard library only (optional extras for dev/docs)
Install
pip install deepwiki-to-md
Usage
- From local HTML/string (CLI and Python):
# CLI
echo "<html>...</html>" | deepwiki-to-md
# Python API
from deepwiki_to_md import ContentExtractor
html = """
<!doctype html>
<html>...</html>
"""
extractor = ContentExtractor()
md = extractor.extract_from_html(html)
print(md)
- From URL (files are saved only when the input is a URL):
# CLI
# Files under .deepwiki are created only for URL input
deepwiki-to-md https://deepwiki.com/microsoft/vscode/some-page --path ./.deepwiki
# Python API (same behavior as the CLI)
from deepwiki_to_md import ContentExtractor, save_markdown_to_library
url = "https://deepwiki.com/microsoft/vscode/some-page"
base_dir = "./.deepwiki" # equivalent to --path (optional)
extractor = ContentExtractor()
md = extractor.extract_from_url(url)
result = save_markdown_to_library(md, url, base_dir)
print("saved files:")
for p in result["saved_files"]:
print(" -", p)
print("library index:", result["library_file"]) # .deepwiki/<username>/<library>.md
- Search public repository indexes:
# CLI (JSON by default)
deepwiki-to-md --search "Gemini"
# Human-readable development-log style
deepwiki-to-md --search "Gemini" --devlog
# Python API (same search capability)
from search_repository import search_repositories, API_URL
print(API_URL) # => https://api.devin.ai/ada/list_public_indexes
result = search_repositories("Gemini")
indices = result.get("indices", [])
print("indices:", len(indices))
- Chat with Devin API (via CLI):
# Positional argument must be a DeepWiki URL
# JSON output by default
deepwiki-to-md https://deepwiki.com/microsoft/vscode --chat "What is the purpose of this repository?"
# Human-readable output for development logs
deepwiki-to-md https://deepwiki.com/microsoft/vscode --chat "Summarize top features" --devlog
Options for chat via deepwiki-to-md:
--chat MESSAGE: Message to send. Requires a DeepWiki URL as the positional input.--deep-research: Enable deep research mode for chat.--config-file PATH: Path to chat config JSON (default: ./config.json). The file must exist and contain complete settings.--devlog: When used with --chat, prints a human-readable response body and reference files.
License
MIT License
More documentation
- Library reference (includes both Python API and CLI examples): deepwiki_to_md.md
Chat (Devin API) result object: ChatResult
The chat helper (src/chat.py) returns a ChatResult object instead of a plain dict.
-
Highlights
- Inherits from dict → works with json.dumps(result) directly.
- Convenient attribute access (e.g., result.response_message) and to_dict().
- print(result) shows a human-readable summary.
-
Main properties
- sent_message: str
- response_message: Optional[str]
- status_code: Any
- reference_files: List[str]
- reference_file_contents: Dict[str, str]
-
Example (excerpt)
import asyncio
import json
from chat import load_or_create_config, send_chat_message, ChatResult
async def main() -> None:
result: ChatResult = await send_chat_message(
wiki_url='https://deepwiki.com/microsoft/vscode',
message='What is the purpose of this repository?',
use_deep_research=False,
)
print(result) # human-readable summary via __str__
print(result.response_message) # attribute access
print(json.dumps(result, indent=2, ensure_ascii=False)) # still a dict
if __name__ == '__main__':
asyncio.run(main())
Arguments for chat.py:
--url: URL of the chat interface.--message: Message to send.--selector: CSS selector for the chat input (default: textarea).--button: CSS selector for the submit button (default: button).--wait: Time to wait for response in seconds (default: 30).--debug: Enable debug mode.--output: Output directory (default: ChatResponses).--deep: Enable "Deep Research" mode (specific to some interfaces).--headless: Run browser in headless mode.--format: Output format(s): html, md, yaml, or comma-separated list (default: html).
Note: The chat scraper uses Selenium, which requires a compatible browser installed.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deepwiki_to_md-2.0.3.tar.gz.
File metadata
- Download URL: deepwiki_to_md-2.0.3.tar.gz
- Upload date:
- Size: 26.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28a8b1e018e25db30b40bda27a08eb1a1999695c0fb596b9c7d396873a856235
|
|
| MD5 |
d5369fe8188da8b51333f6dfc756d497
|
|
| BLAKE2b-256 |
3a8ca046247d039a8e467ad56ab7e294f53d1188535aed3a1b7953349ec93a89
|
File details
Details for the file deepwiki_to_md-2.0.3-py3-none-any.whl.
File metadata
- Download URL: deepwiki_to_md-2.0.3-py3-none-any.whl
- Upload date:
- Size: 25.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58efeea773474eda2ca020d0663ee6a67c269bcf40b6b8c7879e3aaf3c3adcf1
|
|
| MD5 |
c77ea4bd439dd1e190e7986a4081595a
|
|
| BLAKE2b-256 |
b34855bee017238cbc496f31b89df2e7e69d353a60b0f555de63d479b2efcea2
|