Skip to main content

Clip, organize, and manage markdown articles - LLM workflow assistant

Project description

clipmd

Clip, organize, and manage markdown articles - LLM workflow assistant.

A CLI tool for saving, organizing, and managing markdown articles with YAML frontmatter. Designed to assist LLM-based workflows by preprocessing files and executing file operations reliably.

Key Features:

  • ๐Ÿ“ฅ Fetch web content and convert to markdown with frontmatter
  • ๐Ÿงน Preprocess articles (clean URLs, sanitize filenames, fix frontmatter)
  • ๐Ÿ“Š Extract metadata in LLM-optimized format (95%+ token reduction)
  • ๐Ÿ—‚๏ธ Move files based on simple categorization lists
  • ๐Ÿ” Detect duplicates by URL or content hash
  • ๐Ÿ“ˆ Statistics and folder health monitoring

Installation

pip install clipmd
# or with uv
uv add clipmd

# With language detection support
pip install clipmd[lang]

Quick Start

# Initialize in your articles directory
cd ~/Documents/Articles
clipmd init

# Fetch articles from URLs
clipmd fetch "https://example.com/article"
clipmd fetch -f urls.txt  # Or from file

# Preprocess files (clean URLs, sanitize filenames, fix frontmatter)
clipmd preprocess

# Extract metadata for LLM categorization
clipmd extract --folders > articles-metadata.txt

# [LLM or human creates categorization.txt]

# Execute categorization
clipmd move categorization.txt

# View results
clipmd stats

Core Workflow

clipmd is designed for LLM-assisted workflows:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  LLM/Human (Orchestrator)           โ”‚
โ”‚  - Reads clipmd output              โ”‚
โ”‚  - Makes categorization decisions   โ”‚
โ”‚  - Generates simple action lists    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                 โ”‚
                 โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  clipmd (Executor)                  โ”‚
โ”‚  - Fetches and converts content     โ”‚
โ”‚  - Extracts metadata (minimal)      โ”‚
โ”‚  - Executes file operations         โ”‚
โ”‚  - Handles edge cases reliably      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Commands

Fetch & Capture

# Fetch single URL
clipmd fetch "https://example.com/article"

# Fetch multiple URLs
clipmd fetch -f urls.txt

# Dry run (preview without saving)
clipmd fetch --dry-run "https://example.com/article"

Preprocess

# Clean and prepare articles
clipmd preprocess

# Auto-remove duplicates
clipmd preprocess --auto-remove-dupes

# Dry run
clipmd preprocess --dry-run

What it does:

  • Fixes invalid YAML frontmatter
  • Cleans tracking parameters from URLs
  • Sanitizes filenames
  • Adds date prefixes (from frontmatter or content)
  • Detects duplicates

Extract Metadata

# Extract for LLM (markdown format)
clipmd extract > metadata.txt

# With existing folders list
clipmd extract --folders > metadata.txt

# Include word count and language
clipmd extract --include-stats > metadata.txt

# JSON output
clipmd extract --format json > metadata.json

Output example:

# Articles Metadata
# Total: 79 articles

## Existing Folders
AI-Tools, Science, Tech, Misc

## Needs Categorization (79 articles)

1. 20240115-Some-Article.md
   URL: blog.example.com
   Title: Some Article Title
   Desc: First 150 characters of description...

2. 20240116-Another-Article.md
   URL: news.example.com
   Title: Another Article
   Desc: Description preview...

Move Files

# Move based on categorization file
clipmd move categorization.txt

# Dry run
clipmd move --dry-run categorization.txt

Input format (categorization.txt):

# Format: Category - filename.md
# Use TRASH to delete

1. AI-Tools - 20240115-Article-One.md
2. Science - 20240116-Article-Two.md
3. TRASH - duplicate-article.md

Statistics

# View folder statistics
clipmd stats

# Only show warnings
clipmd stats --warnings-only

# JSON output
clipmd stats --format json

Other Commands

# Find duplicates
clipmd duplicates --by-url
clipmd duplicates --by-hash

# Move files to trash
clipmd trash file1.md file2.md

# Validate configuration
clipmd validate

Configuration

Configuration is searched in this order:

  1. ./config.yaml (current directory)
  2. ./.clipmd/config.yaml (project directory)
  3. ~/.config/clipmd/config.yaml (user-wide)

Minimal Config

version: 1
paths:
  root: "."

Example Config

version: 1

paths:
  root: "."
  cache: ".clipmd/cache.json"

frontmatter:
  source_url:
    - source
    - url
    - original_url
  title:
    - title
    - name

dates:
  output_format: "%Y%m%d"
  extract_from_content: true

url_cleaning:
  remove_params:
    - utm_source
    - utm_medium
    - fbclid
    - gclid

filenames:
  replacements:
    " ": "-"
    "_": "-"
  max_length: 100
  collapse_dashes: true

folders:
  warn_below: 10
  warn_above: 45

See SPEC.md for full configuration reference.

Example Workflow

Triage New Articles

# 1. Fetch articles
clipmd fetch -f reading-list.txt

# 2. Preprocess (clean, dedupe)
clipmd preprocess --auto-remove-dupes

# 3. Extract metadata for LLM
clipmd extract --folders > articles-metadata.txt

# 4. [LLM reads articles-metadata.txt and generates categorization.txt]
# Example LLM prompt:
# "Categorize these articles into the existing folders.
#  Output format: 'N. FolderName - filename.md'"

# 5. Execute categorization
clipmd move categorization.txt

# 6. View results
clipmd stats

Reorganize Existing Folders

# Check which folders need attention
clipmd stats --warnings-only

# Extract metadata from problematic folder
clipmd extract "Too-Big-Folder/" --max-chars 100 > reorganize.txt

# [LLM suggests better organization]

# Execute
clipmd move reorganization.txt

# Verify
clipmd stats

LLM Integration

clipmd minimizes token usage for LLM workflows:

Scenario Without clipmd With clipmd Savings
100 articles triage ~200k tokens ~5k tokens 97%
50 articles reorganize ~100k tokens ~3k tokens 97%
Duplicate detection ~50k tokens ~2k tokens 96%

Development

# Install dependencies
make dev

# Run checks (lint, typecheck, tests)
make check

# Run tests with coverage
make test-cov

# Format code
make format

Requirements

  • Python 3.13+
  • uv (recommended) or pip

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clipmd-0.1.0.tar.gz (148.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clipmd-0.1.0-py3-none-any.whl (61.2 kB view details)

Uploaded Python 3

File details

Details for the file clipmd-0.1.0.tar.gz.

File metadata

  • Download URL: clipmd-0.1.0.tar.gz
  • Upload date:
  • Size: 148.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for clipmd-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9bac28420ecda2928b8d8343bf5eb989d7fd04e9e334e11523a5b4ad67730bf2
MD5 83cdb47c4a0f1007c220d99c722fb276
BLAKE2b-256 910596a3ee9f4c9fdb15c5ae7c299d832034abf53b80704c6796096404a5370e

See more details on using hashes here.

File details

Details for the file clipmd-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: clipmd-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 61.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for clipmd-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ab5f846ff486188e46c60b7fb42785a27e17e3ea25260b6c32752afc3a4f2ccc
MD5 6a715508d5bdf6f6c348732015642936
BLAKE2b-256 ee4733c36f5a5315ac5ac59445f196eaea84bcf2099db8a6a74d7fdc5cafde6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page