Skip to main content

Extract URLs from markdown files and process them: save as resource folders, list all links, or concatenate with resources in XML format

Project description

Link Weaver 🕸️

Extract URLs from markdown files and process them with three powerful modes: save as organized resource folders, list all links, or concatenate with resources in XML format!

✨ Features

  • 📝 Markdown URL Extraction: Finds and extracts all URLs from your markdown files automatically
  • 🌐 Multi-format Support: Handles web pages, PDFs, YouTube videos, and more thanks to MarkItDown
  • 📁 Resource Mode: Creates resource folders with each link saved as individual markdown files
  • 📋 List Mode: Quickly list all unique URLs found across files
  • 🔗 XML Cat Mode: Concatenate files with their resources in structured XML format
  • 🏷️ Smart Naming: Uses URL-based filenames that are human-readable and identifiable
  • Duplicate Removal: Automatically removes duplicate URLs before processing
  • 📊 Progress Tracking: Shows real-time progress with clear status indicators
  • 💻 CLI Interface: Simple command-line interface for easy integration into workflows

📦 Installation

# Install from PyPI (when available)
pip install linkweaver

# Or install from source
git clone https://github.com/davidgasquez/linkweaver
cd linkweaver
uv sync

🚀 Quick Start

# Process files and create resource folders (default mode)
linkweaver my-notes.md
# Creates: my-notes-resources/ folder with individual .md files

# List all unique URLs found in files
linkweaver --list-links notes/*.md

# Concatenate files with their resources in XML format
linkweaver --xml-cat my-notes.md

🔧 CLI API

Basic Usage

$ linkweaver --help
usage: linkweaver [-h] [--list-links] [--xml-cat] input_files [input_files ...]

Extract URLs from markdown files and save each link as individual markdown files in resource folders

positional arguments:
  input_files           One or more markdown files to process

options:
  -h, --help            show this help message and exit
  --list-links, -l      List all unique URLs found in the files (no downloading)
  --xml-cat             Concatenate all markdown files with their resource folders in XML structure

📁 Output Structure

For each input file, linkweaver creates a resource folder with individual markdown files:

my-notes.md
my-notes-resources/
├── example.com-page-title.md
├── github.com-user-repo.md
└── youtube.com-watch-v-abc123.md

Each resource file includes:

  • Clean, URL-based filename
  • Original source URL in metadata
  • Full markdown-converted content
  • Error information if fetch failed

Example resource file content:

# Example Page Title

**Source:** https://example.com/page/title

[Converted markdown content here...]

🔄 Workflow Example

Default Mode (Resource Creation)

$ linkweaver research-notes.md
Extracting URLs from research-notes.md...
Found 5 URLs in research-notes.md

Total unique URLs found: 5

Processing research-notes.md -> research-notes-resources/
Processing 5 unique URLs using markitdown...
Fetching 1/5: https://example.com/article
✓ Saved: example.com-article.md
Fetching 2/5: https://github.com/user/repo
✓ Saved: github.com-user-repo.md
...

List Links Mode

$ linkweaver --list-links research-notes.md
https://example.com/article
https://github.com/user/repo
https://youtube.com/watch?v=abc123
...

XML Cat Mode

$ linkweaver --xml-cat research-notes.md
<document source='research-notes.md'>
[Original file content]

<resources>
<resource file='example.com-article.md'>
[Resource content]
</resource>
...
</resources>
</document>

📜 License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkweaver-0.2.0.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linkweaver-0.2.0-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file linkweaver-0.2.0.tar.gz.

File metadata

  • Download URL: linkweaver-0.2.0.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.21

File hashes

Hashes for linkweaver-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ba548eada19198568014c42aa1a02309b34aba1e7d4d018214499ab2e2b916dd
MD5 a8c2647147d2863203282d5a89e59a5b
BLAKE2b-256 ede7f4c14deb5bec8da3522bb2458f004cbc0894712ac91a9c2d029c43fee09f

See more details on using hashes here.

File details

Details for the file linkweaver-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: linkweaver-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.21

File hashes

Hashes for linkweaver-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2834e49271e73fe18cd202cc9ebe611f97e56027adcb50e127ddb0114f7475b1
MD5 4720a66480c2a132a19580b385fa7575
BLAKE2b-256 a4bcc606c507687104990234eecb855667de3339c44c049cb7034996f6dfe8b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page