Skip to main content

Extract URLs from markdown files and convert them to individual markdown documents.

Project description

Link Weaver 🕸️

Extract URLs from markdown files and convert them to individual markdown documents. Link Weaver is a CLI tool that finds all URLs in your markdown files, fetches their content, converts to markdown, and saves each as a separate file in organized resource folders.

✨ Features

  • 📝 Markdown URL Extraction: Finds and extracts all URLs from your markdown files automatically
  • 🌐 Multi-format Support: Handles web pages, PDFs, YouTube videos, and more thanks to MarkItDown
  • 📁 Resource Mode: Creates resource folders with each link saved as individual markdown files
  • 📋 List Mode: Quickly list all unique URLs found across files
  • 🔗 XML Cat Mode: Concatenate files with their resources in structured XML format
  • 🏷️ Smart Naming: Uses URL-based filenames that are human-readable and identifiable
  • Duplicate Removal: Automatically removes duplicate URLs before processing
  • 📊 Progress Tracking: Shows real-time progress with clear status indicators
  • 💻 CLI Interface: Simple command-line interface for easy integration into workflows

📦 Installation

# Install from PyPI (when available)
pip install linkweaver

# Or install from source
git clone https://github.com/davidgasquez/linkweaver
cd linkweaver
uv sync

🚀 Quick Start

# Process files and create resource folders (default mode)
linkweaver my-notes.md
# Creates: my-notes-resources/ folder with individual .md files

# List all unique URLs found in files
linkweaver --list-links notes/*.md

# Concatenate files with their resources in XML format
linkweaver --xml-cat my-notes.md

🔧 CLI API

Basic Usage

$ linkweaver --help
usage: linkweaver [-h] [--list-links] [--xml-cat] input_files [input_files ...]

Extract URLs from markdown files and save each link as individual markdown files in resource folders

positional arguments:
  input_files           One or more markdown files to process

options:
  -h, --help            show this help message and exit
  --list-links, -l      List all unique URLs found in the files (no downloading)
  --xml-cat             Concatenate all markdown files with their resource folders in XML structure

📁 Output Structure

For each input file, linkweaver creates a resource folder with individual markdown files:

my-notes.md
my-notes-resources/
├── example.com-page-title.md
├── github.com-user-repo.md
└── youtube.com-watch-v-abc123.md

Each resource file includes:

  • Clean, URL-based filename
  • Original source URL in metadata
  • Full markdown-converted content
  • Error information if fetch failed

Example resource file content:

# Example Page Title

**Source:** https://example.com/page/title

[Converted markdown content here...]

🔄 Workflow Example

Default Mode (Resource Creation)

$ linkweaver research-notes.md
Extracting URLs from research-notes.md...
Found 5 URLs in research-notes.md

Total unique URLs found: 5

Processing research-notes.md -> research-notes-resources/
Processing 5 unique URLs using markitdown...
Fetching 1/5: https://example.com/article
✓ Saved: example.com-article.md
Fetching 2/5: https://github.com/user/repo
✓ Saved: github.com-user-repo.md
...

List Links Mode

$ linkweaver --list-links research-notes.md
https://example.com/article
https://github.com/user/repo
https://youtube.com/watch?v=abc123
...

XML Cat Mode

$ linkweaver --xml-cat research-notes.md
<document source='research-notes.md'>
[Original file content]

<resources>
<resource file='example.com-article.md'>
[Resource content]
</resource>
...
</resources>
</document>

📜 License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkweaver-0.3.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linkweaver-0.3.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file linkweaver-0.3.0.tar.gz.

File metadata

  • Download URL: linkweaver-0.3.0.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.21

File hashes

Hashes for linkweaver-0.3.0.tar.gz
Algorithm Hash digest
SHA256 4cbb84d83794f786403a9cb4e330ede69d103d95e19c8f676c1826866d5152d1
MD5 f12cbae46e3f01cc4049de4e1db22a17
BLAKE2b-256 304b610589428288e0a315550d965ba84bd2debff341eda51b196d8a6bb47acb

See more details on using hashes here.

File details

Details for the file linkweaver-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: linkweaver-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.21

File hashes

Hashes for linkweaver-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d80642f965fcb09aeb17ff1b3e4fb92ec8c8de6fce6f9df4184e3a50e3e33478
MD5 3bc9e6629e4f68d503497527bc9e344c
BLAKE2b-256 ee5aa6e257d24abadbf3ddd31054ba873362997f300214a6d9b598f58cf64727

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page