Skip to main content

Extract URLs from markdown files and convert them to individual markdown documents.

Project description

LinkWeaver 🕸️

Extract URLs from markdown files and convert them to individual markdown documents. LinkWeaver is a CLI tool that finds all URLs in your markdown files, fetches their content, converts to markdown, and saves each as a separate file in organized resource folders.

✨ Features

  • 📝 Markdown URL Extraction: Finds and extracts all URLs from your markdown files automatically
  • 🌐 Multi-format Support: Handles web pages, PDFs, YouTube videos, and more thanks to MarkItDown
  • 📁 Resource Mode: Creates resource folders with each link saved as individual markdown files
  • 📋 List Mode: Quickly list all unique URLs found across files
  • 🔗 XML Cat Mode: Concatenate files with their resources in structured XML format
  • 🏷️ Smart Naming: Uses URL-based filenames that are human-readable and identifiable
  • Duplicate Removal: Automatically removes duplicate URLs before processing
  • 📊 Progress Tracking: Shows real-time progress with clear status indicators
  • 🔄 Retry Logic: Automatic retry with exponential backoff for failed requests
  • 🛠️ Post-processing: Execute custom commands on content before saving
  • 💻 CLI Interface: Simple command-line interface for easy integration into workflows

📦 Installation

# Install as a tool with uv (recommended)
uv tool install linkweaver

# Or install from PyPI
pip install linkweaver

# Or install from source
git clone https://github.com/davidgasquez/linkweaver
cd linkweaver
uv sync

🚀 Quick Start

# Process files and create resource folders (default mode)
linkweaver my-notes.md
# Creates: my-notes-resources/ folder with individual .md files

# List all unique URLs found in files
linkweaver --list-links notes/*.md

# Concatenate files with their resources in XML format
linkweaver --xml-cat my-notes.md

# Process with custom command (e.g., clean up content)
linkweaver -x 'llm -t clean' my-notes.md

# Force redownload all resources
linkweaver --force my-notes.md

# Preview what would be done (dry run)
linkweaver --dry-run my-notes.md

🔧 CLI Options

Main Commands

linkweaver [OPTIONS] input_files...

Options:
  -h, --help            Show help message and exit
  --list-links, -l      List all unique URLs found in files (no downloading)
  --xml-cat             Concatenate files with their resources in XML structure
  -x, --exec COMMAND    Execute shell command on content before saving
  -v, --verbose         Enable verbose output with detailed progress
  --retries N           Number of retry attempts for failed fetches (default: 3)
  -q, --quiet           Minimize output (still shows errors/warnings)
  --dry-run             Show what would be done without actually doing it
  --no-color            Disable colored output
  -f, --force           Force redownload even if files already exist

Common Usage Patterns

# Preview first 10 URLs across multiple files
linkweaver --list-links *.md | head -10

# See what would be fetched/skipped
linkweaver --dry-run notes.md

# Preview forced redownload
linkweaver --force --dry-run notes.md

# Disable retries for speed
linkweaver --retries 0 notes.md

# Browse concatenated content
linkweaver --xml-cat notes.md | less

# Quiet mode with custom retry count
linkweaver --quiet --retries 5 notes.md

# Process multiple files
linkweaver *.md

📁 Output Structure

For each input file, linkweaver creates a resource folder with individual markdown files:

my-notes.md
my-notes-resources/
├── example.com-page-title.md
├── github.com-user-repo.md
└── youtube.com-watch-v-abc123.md

Each resource file includes:

  • Clean, URL-based filename
  • Original source URL in metadata
  • Full markdown-converted content
  • Error information if fetch failed

Example resource file content:

# Example Page Title

**Source:** https://example.com/page/title

[Converted markdown content here...]

📜 License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkweaver-0.5.0.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linkweaver-0.5.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file linkweaver-0.5.0.tar.gz.

File metadata

  • Download URL: linkweaver-0.5.0.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.7.21

File hashes

Hashes for linkweaver-0.5.0.tar.gz
Algorithm Hash digest
SHA256 cdbddccff99ec2bb719c13d2eb9af0da51933353960614f530bdabfb72e7af21
MD5 89d7d0fc4d1b950ebce5b97137189ec4
BLAKE2b-256 97e856d9da144863b1269656130b316c47d0b2357da140d285c02c5abaf6caa7

See more details on using hashes here.

File details

Details for the file linkweaver-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: linkweaver-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.7.21

File hashes

Hashes for linkweaver-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a2cffd22a32639fc11f96a5ad8995df8580dc49de975cc3f7937b703badb1a96
MD5 da5e33cc3e3b28c4d978920bf19fb3a1
BLAKE2b-256 23b322db0a3942e3307fd1dd33082de77e5de0efbaab7d6a685f9125b8ebff29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page