Skip to main content

Extract URLs from markdown files and convert them to individual markdown documents.

Project description

Link Weaver 🕸️

Extract URLs from markdown files and convert them to individual markdown documents. Link Weaver is a CLI tool that finds all URLs in your markdown files, fetches their content, converts to markdown, and saves each as a separate file in organized resource folders.

✨ Features

  • 📝 Markdown URL Extraction: Finds and extracts all URLs from your markdown files automatically
  • 🌐 Multi-format Support: Handles web pages, PDFs, YouTube videos, and more thanks to MarkItDown
  • 📁 Resource Mode: Creates resource folders with each link saved as individual markdown files
  • 📋 List Mode: Quickly list all unique URLs found across files
  • 🔗 XML Cat Mode: Concatenate files with their resources in structured XML format
  • 🏷️ Smart Naming: Uses URL-based filenames that are human-readable and identifiable
  • Duplicate Removal: Automatically removes duplicate URLs before processing
  • 📊 Progress Tracking: Shows real-time progress with clear status indicators
  • 💻 CLI Interface: Simple command-line interface for easy integration into workflows

📦 Installation

# Install from PyPI (when available)
pip install linkweaver

# Or install from source
git clone https://github.com/davidgasquez/linkweaver
cd linkweaver
uv sync

🚀 Quick Start

# Process files and create resource folders (default mode)
linkweaver my-notes.md
# Creates: my-notes-resources/ folder with individual .md files

# List all unique URLs found in files
linkweaver --list-links notes/*.md

# Concatenate files with their resources in XML format
linkweaver --xml-cat my-notes.md

🔧 CLI API

Basic Usage

$ linkweaver --help
usage: linkweaver [-h] [--list-links] [--xml-cat] input_files [input_files ...]

Extract URLs from markdown files and save each link as individual markdown files in resource folders

positional arguments:
  input_files           One or more markdown files to process

options:
  -h, --help            show this help message and exit
  --list-links, -l      List all unique URLs found in the files (no downloading)
  --xml-cat             Concatenate all markdown files with their resource folders in XML structure

📁 Output Structure

For each input file, linkweaver creates a resource folder with individual markdown files:

my-notes.md
my-notes-resources/
├── example.com-page-title.md
├── github.com-user-repo.md
└── youtube.com-watch-v-abc123.md

Each resource file includes:

  • Clean, URL-based filename
  • Original source URL in metadata
  • Full markdown-converted content
  • Error information if fetch failed

Example resource file content:

# Example Page Title

**Source:** https://example.com/page/title

[Converted markdown content here...]

🔄 Workflow Example

Default Mode (Resource Creation)

$ linkweaver research-notes.md
Extracting URLs from research-notes.md...
Found 5 URLs in research-notes.md

Total unique URLs found: 5

Processing research-notes.md -> research-notes-resources/
Processing 5 unique URLs using markitdown...
Fetching 1/5: https://example.com/article
✓ Saved: example.com-article.md
Fetching 2/5: https://github.com/user/repo
✓ Saved: github.com-user-repo.md
...

List Links Mode

$ linkweaver --list-links research-notes.md
https://example.com/article
https://github.com/user/repo
https://youtube.com/watch?v=abc123
...

XML Cat Mode

$ linkweaver --xml-cat research-notes.md
<document source='research-notes.md'>
[Original file content]

<resources>
<resource file='example.com-article.md'>
[Resource content]
</resource>
...
</resources>
</document>

📜 License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkweaver-0.4.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linkweaver-0.4.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file linkweaver-0.4.0.tar.gz.

File metadata

  • Download URL: linkweaver-0.4.0.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.7.21

File hashes

Hashes for linkweaver-0.4.0.tar.gz
Algorithm Hash digest
SHA256 698c90a24bc15d80843328fe1cd7e5bc41e71bcd58e5f778e91d4b1031ed67f7
MD5 2dbf6a701d03404ff729b56b30b59030
BLAKE2b-256 c40f262c89c3437a587969961862a2abb663c7ab7f68f8465b98e5906551b35a

See more details on using hashes here.

File details

Details for the file linkweaver-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: linkweaver-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.7.21

File hashes

Hashes for linkweaver-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b70b93acdffa35f96d3b5da52ed0505ee04f3f3e8dd3d9fcbb1db6c356cd7586
MD5 1fd3fcf525e004657a5be3174d8ebd01
BLAKE2b-256 2d15536bc7561eb166f621bf220a7a4bb4ebdd4da4f16dcc02748dba96fd2e4b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page