Extract URLs from markdown files and convert them to individual markdown documents.
Project description
Link Weaver 🕸️
Extract URLs from markdown files and convert them to individual markdown documents. Link Weaver is a CLI tool that finds all URLs in your markdown files, fetches their content, converts to markdown, and saves each as a separate file in organized resource folders.
✨ Features
- 📝 Markdown URL Extraction: Finds and extracts all URLs from your markdown files automatically
- 🌐 Multi-format Support: Handles web pages, PDFs, YouTube videos, and more thanks to MarkItDown
- 📁 Resource Mode: Creates resource folders with each link saved as individual markdown files
- 📋 List Mode: Quickly list all unique URLs found across files
- 🔗 XML Cat Mode: Concatenate files with their resources in structured XML format
- 🏷️ Smart Naming: Uses URL-based filenames that are human-readable and identifiable
- ⚡ Duplicate Removal: Automatically removes duplicate URLs before processing
- 📊 Progress Tracking: Shows real-time progress with clear status indicators
- 💻 CLI Interface: Simple command-line interface for easy integration into workflows
📦 Installation
# Install from PyPI (when available)
pip install linkweaver
# Or install from source
git clone https://github.com/davidgasquez/linkweaver
cd linkweaver
uv sync
🚀 Quick Start
# Process files and create resource folders (default mode)
linkweaver my-notes.md
# Creates: my-notes-resources/ folder with individual .md files
# List all unique URLs found in files
linkweaver --list-links notes/*.md
# Concatenate files with their resources in XML format
linkweaver --xml-cat my-notes.md
🔧 CLI API
Basic Usage
$ linkweaver --help
usage: linkweaver [-h] [--list-links] [--xml-cat] input_files [input_files ...]
Extract URLs from markdown files and save each link as individual markdown files in resource folders
positional arguments:
input_files One or more markdown files to process
options:
-h, --help show this help message and exit
--list-links, -l List all unique URLs found in the files (no downloading)
--xml-cat Concatenate all markdown files with their resource folders in XML structure
📁 Output Structure
For each input file, linkweaver creates a resource folder with individual markdown files:
my-notes.md
my-notes-resources/
├── example.com-page-title.md
├── github.com-user-repo.md
└── youtube.com-watch-v-abc123.md
Each resource file includes:
- Clean, URL-based filename
- Original source URL in metadata
- Full markdown-converted content
- Error information if fetch failed
Example resource file content:
# Example Page Title
**Source:** https://example.com/page/title
[Converted markdown content here...]
🔄 Workflow Example
Default Mode (Resource Creation)
$ linkweaver research-notes.md
Extracting URLs from research-notes.md...
Found 5 URLs in research-notes.md
Total unique URLs found: 5
Processing research-notes.md -> research-notes-resources/
Processing 5 unique URLs using markitdown...
Fetching 1/5: https://example.com/article
✓ Saved: example.com-article.md
Fetching 2/5: https://github.com/user/repo
✓ Saved: github.com-user-repo.md
...
List Links Mode
$ linkweaver --list-links research-notes.md
https://example.com/article
https://github.com/user/repo
https://youtube.com/watch?v=abc123
...
XML Cat Mode
$ linkweaver --xml-cat research-notes.md
<document source='research-notes.md'>
[Original file content]
<resources>
<resource file='example.com-article.md'>
[Resource content]
</resource>
...
</resources>
</document>
📜 License
MIT License - see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file linkweaver-0.4.0.tar.gz.
File metadata
- Download URL: linkweaver-0.4.0.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.7.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
698c90a24bc15d80843328fe1cd7e5bc41e71bcd58e5f778e91d4b1031ed67f7
|
|
| MD5 |
2dbf6a701d03404ff729b56b30b59030
|
|
| BLAKE2b-256 |
c40f262c89c3437a587969961862a2abb663c7ab7f68f8465b98e5906551b35a
|
File details
Details for the file linkweaver-0.4.0-py3-none-any.whl.
File metadata
- Download URL: linkweaver-0.4.0-py3-none-any.whl
- Upload date:
- Size: 6.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.7.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b70b93acdffa35f96d3b5da52ed0505ee04f3f3e8dd3d9fcbb1db6c356cd7586
|
|
| MD5 |
1fd3fcf525e004657a5be3174d8ebd01
|
|
| BLAKE2b-256 |
2d15536bc7561eb166f621bf220a7a4bb4ebdd4da4f16dcc02748dba96fd2e4b
|