Clip, organize, and manage markdown articles - LLM workflow assistant
Project description
clipmd
Clip, organize, and manage markdown articles - LLM workflow assistant.
A CLI tool for saving, organizing, and managing markdown articles with YAML frontmatter. Designed to assist LLM-based workflows by preprocessing files and executing file operations reliably.
Key Features:
- ๐ฅ Fetch web content and convert to markdown with frontmatter
- ๐งน Preprocess articles (clean URLs, sanitize filenames, fix frontmatter)
- ๐ Extract metadata in LLM-optimized format (95%+ token reduction)
- ๐๏ธ Move files based on simple categorization lists
- ๐ Detect duplicates by URL or content hash
- ๐ Statistics and folder health monitoring
Installation
pip install clipmd
# or with uv
uv add clipmd
# With language detection support
pip install clipmd[lang]
Quick Start
# Initialize in your articles directory
cd ~/Documents/Articles
clipmd init
# Fetch articles from URLs
clipmd fetch "https://example.com/article"
clipmd fetch -f urls.txt # Or from file
# Preprocess files (clean URLs, sanitize filenames, fix frontmatter)
clipmd preprocess
# Extract metadata for LLM categorization
clipmd extract --folders > articles-metadata.txt
# [LLM or human creates categorization.txt]
# Execute categorization
clipmd move categorization.txt
# View results
clipmd stats
Core Workflow
clipmd is designed for LLM-assisted workflows:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ LLM/Human (Orchestrator) โ
โ - Reads clipmd output โ
โ - Makes categorization decisions โ
โ - Generates simple action lists โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ clipmd (Executor) โ
โ - Fetches and converts content โ
โ - Extracts metadata (minimal) โ
โ - Executes file operations โ
โ - Handles edge cases reliably โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Commands
Fetch & Capture
# Fetch single URL
clipmd fetch "https://example.com/article"
# Fetch multiple URLs
clipmd fetch -f urls.txt
# Dry run (preview without saving)
clipmd fetch --dry-run "https://example.com/article"
Preprocess
# Clean and prepare articles
clipmd preprocess
# Auto-remove duplicates
clipmd preprocess --auto-remove-dupes
# Dry run
clipmd preprocess --dry-run
What it does:
- Fixes invalid YAML frontmatter
- Cleans tracking parameters from URLs
- Sanitizes filenames
- Adds date prefixes (from frontmatter or content)
- Detects duplicates
Extract Metadata
# Extract for LLM (markdown format)
clipmd extract > metadata.txt
# With existing folders list
clipmd extract --folders > metadata.txt
# Include word count and language
clipmd extract --include-stats > metadata.txt
# JSON output
clipmd extract --format json > metadata.json
Output example:
# Articles Metadata
# Total: 79 articles
## Existing Folders
AI-Tools, Science, Tech, Misc
## Needs Categorization (79 articles)
1. 20240115-Some-Article.md
URL: blog.example.com
Title: Some Article Title
Desc: First 150 characters of description...
2. 20240116-Another-Article.md
URL: news.example.com
Title: Another Article
Desc: Description preview...
Move Files
# Move based on categorization file
clipmd move categorization.txt
# Dry run
clipmd move --dry-run categorization.txt
Input format (categorization.txt):
# Format: Category - filename.md
# Use TRASH to delete
1. AI-Tools - 20240115-Article-One.md
2. Science - 20240116-Article-Two.md
3. TRASH - duplicate-article.md
Statistics
# View folder statistics
clipmd stats
# Only show warnings
clipmd stats --warnings-only
# JSON output
clipmd stats --format json
Other Commands
# Find duplicates
clipmd duplicates --by-url
clipmd duplicates --by-hash
# Move files to trash
clipmd trash file1.md file2.md
# Validate configuration
clipmd validate
Configuration
Configuration is searched in this order:
./config.yaml(current directory)./.clipmd/config.yaml(project directory)~/.config/clipmd/config.yaml(user-wide)
Minimal Config
version: 1
paths:
root: "."
Example Config
version: 1
paths:
root: "."
cache: ".clipmd/cache.json"
frontmatter:
source_url:
- source
- url
- original_url
title:
- title
- name
dates:
output_format: "%Y%m%d"
extract_from_content: true
url_cleaning:
remove_params:
- utm_source
- utm_medium
- fbclid
- gclid
filenames:
replacements:
" ": "-"
"_": "-"
max_length: 100
collapse_dashes: true
folders:
warn_below: 10
warn_above: 45
See SPEC.md for full configuration reference.
Example Workflow
Triage New Articles
# 1. Fetch articles
clipmd fetch -f reading-list.txt
# 2. Preprocess (clean, dedupe)
clipmd preprocess --auto-remove-dupes
# 3. Extract metadata for LLM
clipmd extract --folders > articles-metadata.txt
# 4. [LLM reads articles-metadata.txt and generates categorization.txt]
# Example LLM prompt:
# "Categorize these articles into the existing folders.
# Output format: 'N. FolderName - filename.md'"
# 5. Execute categorization
clipmd move categorization.txt
# 6. View results
clipmd stats
Reorganize Existing Folders
# Check which folders need attention
clipmd stats --warnings-only
# Extract metadata from problematic folder
clipmd extract "Too-Big-Folder/" --max-chars 100 > reorganize.txt
# [LLM suggests better organization]
# Execute
clipmd move reorganization.txt
# Verify
clipmd stats
LLM Integration
clipmd minimizes token usage for LLM workflows:
| Scenario | Without clipmd | With clipmd | Savings |
|---|---|---|---|
| 100 articles triage | ~200k tokens | ~5k tokens | 97% |
| 50 articles reorganize | ~100k tokens | ~3k tokens | 97% |
| Duplicate detection | ~50k tokens | ~2k tokens | 96% |
Development
# Install dependencies
make dev
# Run checks (lint, typecheck, tests)
make check
# Run tests with coverage
make test-cov
# Format code
make format
Requirements
- Python 3.13+
- uv (recommended) or pip
Documentation
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clipmd-0.1.0.tar.gz.
File metadata
- Download URL: clipmd-0.1.0.tar.gz
- Upload date:
- Size: 148.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bac28420ecda2928b8d8343bf5eb989d7fd04e9e334e11523a5b4ad67730bf2
|
|
| MD5 |
83cdb47c4a0f1007c220d99c722fb276
|
|
| BLAKE2b-256 |
910596a3ee9f4c9fdb15c5ae7c299d832034abf53b80704c6796096404a5370e
|
File details
Details for the file clipmd-0.1.0-py3-none-any.whl.
File metadata
- Download URL: clipmd-0.1.0-py3-none-any.whl
- Upload date:
- Size: 61.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab5f846ff486188e46c60b7fb42785a27e17e3ea25260b6c32752afc3a4f2ccc
|
|
| MD5 |
6a715508d5bdf6f6c348732015642936
|
|
| BLAKE2b-256 |
ee4733c36f5a5315ac5ac59445f196eaea84bcf2099db8a6a74d7fdc5cafde6b
|