Bidirectional markdown ↔ SQLite conversion with column limit protection
Project description
SQLDown
Bidirectional markdown ↔ SQLite conversion - Load, query, and dump with ease.
Status
✅ MVP Features Implemented:
- ✅ Column limit validation with
--max-columnsflag - ✅ Top-N section extraction with
--top-sectionsflag - ✅ Import with dynamic schema generation
- ✅ Watch mode for auto-refresh
- 🚧 Dump command (planned)
- 🚧 Info command (planned)
See SPECIFICATION.md for the complete v0.1 design. See REVIEW.md for Papa Bear's detailed design review.
Philosophy
- Bidirectional: Load markdown → SQLite, dump SQLite → markdown
- Simple: Use sqlite3 for queries, Python for parsing
- Smart: Dynamic schema, path-aware dumps, change detection
- Database as authority: Enforces schema consistency
Quick Start (Future v0.1)
# Load markdown files into database
sqldown load ~/tasks
# Query with sqlite3
sqlite3 sqldown.db "SELECT * FROM docs WHERE status='active'"
# Dump back to markdown
sqldown dump -d sqldown.db -o ~/restored
# Get info
sqldown info -d sqldown.db
Import Command
python3 bin/import.py --db DATABASE --table TABLE --root PATH [OPTIONS]
What it does:
- Scans markdown files recursively
- Parses YAML frontmatter → columns
- Extracts H2 sections →
section_*columns - Creates schema dynamically based on discovered fields
- Upserts into SQLite (idempotent - run multiple times safely)
Options:
--db PATH- Database file (creates if doesn't exist)--table NAME- Table name (default:docs)--root PATH- Directory containing markdown files--pattern GLOB- File pattern (default:**/*.md)--max-columns N- Maximum allowed columns (default: 1800, SQLite limit: 2000)--top-sections N- Extract only top N most common sections (default: 20, 0=all)--watch, -w- Watch for file changes and auto-update--verbose, -v- Show progress
Using SQLite3
Once imported, use sqlite3 directly for all queries:
# List tables
sqlite3 cache.db ".tables"
# Show schema
sqlite3 cache.db ".schema tasks"
# Query
sqlite3 cache.db "SELECT title, status FROM tasks WHERE status='active' LIMIT 10"
# Aggregate
sqlite3 cache.db "SELECT status, COUNT(*) FROM tasks GROUP BY status"
# Complex queries
sqlite3 cache.db "
SELECT project, COUNT(*) as active_count
FROM tasks
WHERE status='active'
GROUP BY project
ORDER BY active_count DESC
"
# Export to CSV
sqlite3 -csv cache.db "SELECT * FROM tasks WHERE status='active'" > active.csv
# Interactive mode
sqlite3 cache.db
Dynamic Schema Example
From this markdown:
---
status: active
project: agents
priority: high
---
# Add SQLite caching
## Objective
Create a cache layer...
## Implementation Plan
1. Parser
2. Schema
Creates these columns automatically:
- Core:
_id,_path,_sections,title,body,lead,file_modified - Frontmatter:
status,project,priority - Sections:
section_objective,section_implementation_plan
Real example: 87 tasks → 181 columns (no schema design needed!)
Column Limit Protection
SQLite has a hard limit of 2000 columns per table. With diverse markdown documents, you can easily hit this limit:
Problem: 5,225 tasks with diverse sections = 6,694 columns (💥 exceeds limit!)
Solution: Use --top-sections to extract only the most common sections:
# Extract only top 20 most common sections (default)
python3 bin/import.py --db cache.db --root ~/tasks --top-sections 20
# Result: 5,225 tasks → 116 columns ✅
# - 7 base columns (_id, _path, title, body, lead, _sections, file_modified)
# - 89 frontmatter columns (status, project, type, priority, etc.)
# - 20 section columns (overview, usage, objective, notes, next_steps, etc.)
What happens to other sections?
- All content is preserved in the
bodyfield - You can still search across all sections using SQLite FTS5
- Only the top N sections become queryable columns
Column limit validation:
# Check if your documents will fit before importing
python3 bin/import.py --db test.db --root ~/docs --verbose
# Output shows breakdown:
# 📊 Column breakdown:
# - Base columns: 7
# - Frontmatter columns: 89
# - Section columns: 20
# - Total: 116 (limit: 1800)
When approaching limit (>90%):
- Shows warning but continues import
- Consider: reducing --top-sections or increasing --max-columns
When exceeding limit:
- Stops before import to prevent database corruption
- Shows breakdown and suggestions
Multiple Collections
One database, multiple tables:
# Import different document types
python3 bin/import.py --db cache.db --table tasks --root ~/tasks
python3 bin/import.py --db cache.db --table notes --root ~/notes
python3 bin/import.py --db cache.db --table skills --root ~/.claude/skills
# Query them
sqlite3 cache.db "SELECT * FROM tasks WHERE status='active'"
sqlite3 cache.db "SELECT * FROM notes WHERE tags LIKE '%sqlite%'"
# Join across tables
sqlite3 cache.db "
SELECT t.title as task, n.title as note
FROM tasks t
JOIN notes n ON n.tags LIKE '%' || t.project || '%'
WHERE t.status='active'
"
Refresh Strategy
One-time import:
Import is idempotent - just run it again:
# Add this to cron or a git hook
python3 bin/import.py --db cache.db --table tasks --root ~/tasks
Watch mode (auto-refresh):
Use the --watch / -w flag to automatically update the cache when files change:
# Watch mode: import once, then auto-update on file changes
md-import --db cache.db --table tasks --root ~/tasks --watch
# Output:
# ✅ Imported 87 documents into cache.db:tasks
# 📋 Schema has 181 columns
#
# 👀 Watching /Users/admin/tasks for changes... (Ctrl-C to stop)
# [2025-01-15 10:23:45] Updated: AG-22_feat_add-configuration/README.md
# [2025-01-15 10:24:12] Added: AG-31_feat_new-feature/README.md
Watch mode is ideal for development workflows where you want the cache to stay in sync with your files.
Common Queries
# Active tasks
sqlite3 cache.db "SELECT title FROM tasks WHERE status='active'"
# Recent updates
sqlite3 cache.db "SELECT title, updated FROM tasks ORDER BY updated DESC LIMIT 10"
# By project
sqlite3 cache.db "SELECT project, COUNT(*) FROM tasks GROUP BY project"
# Search content
sqlite3 cache.db "SELECT title FROM tasks WHERE body LIKE '%cache%'"
# High priority incomplete
sqlite3 cache.db "SELECT title FROM tasks WHERE priority='high' AND status != 'completed'"
Why Not Use query.py?
You could! But sqlite3 gives you:
- Full SQL power (no wrapper limitations)
- Standard tool everyone knows
- CSV export, JSON mode, etc.
- Interactive shell with history
- Better performance (no Python overhead)
The import tool adds value (markdown parsing). Query wrappers don't.
Requirements
Prerequisites:
- Python 3.7+ (includes sqlite3 module - standard library)
- sqlite3 CLI (built-in on macOS 10.4+ and most Linux distributions)
- uv - Python package installer and runner
Dependencies (handled automatically by uv):
The md-import script uses uv's inline dependency specification, so dependencies are automatically installed on first run:
- sqlite-utils >= 3.30
- click >= 8.0
- PyYAML >= 6.0
- pathspec >= 0.11.0
No manual pip install or virtual environment setup needed!
Claude Code Integration
The SKILL.md file teaches Claude to:
- Use
bin/import.pyto create/refresh caches - Use
sqlite3for all queries - Start with simple queries, then get complex if needed
- Read full markdown files only when necessary (progressive disclosure)
License
MIT
Development Log
2025-01-13 - Design Session: Python Package Specification
Session Goals:
- Design conversion from uv single-file script to proper Python package
- Plan bidirectional markdown ↔ SQLite workflow
- Define CLI interface and conventions
Key Decisions:
-
Package Structure:
- PyPI name:
sqldown - CLI commands:
load,dump,info - Modern packaging with
pyproject.tomlandsrc/layout
- PyPI name:
-
Command Design:
sqldown load- markdown → database (with--wherefor simple filtering)sqldown dump- database → markdown (with full SQL--filter)sqldown info- database stats and table summaries- All commands support
--dry-run
-
Smart Features:
- Path-aware dumping (only dump files under target directory)
- Change detection (only write files if content differs)
- Frontmatter filtering during load (simple key=value in v0.1)
- Automatic gitignore respect
- Watch mode for file changes
-
Configuration:
- Default database:
./sqldown.db - Environment variable:
SQLDOWN_DB - Tab completion for bash/zsh/fish
- Default database:
-
Conventions:
- Recommended frontmatter field:
doctype(for document type classification) - Short flags: -d (db), -t (table), -o (output), -w (watch), -v (verbose)
- Database as schema authority
- Recommended frontmatter field:
Testing:
- Imported 686 tasks from ~/tasks into test database
- Hit SQLite's 2000 column limit (1927 section columns, 72 frontmatter fields)
- Confirmed round-trip capability (body field contains full markdown)
Artifacts:
- SPECIFICATION.md - Complete v0.1 design (627 lines)
- REVIEW.md - Papa Bear's design review and recommendations
- Test database analysis showing column distribution
Future Enhancements (v0.2+):
- Full SQL WHERE during load (currently simple key=value)
- Bidirectional watch mode (database changes → file updates)
- JSON overflow columns for >2000 column scenarios
- FTS5 full-text search
- Configuration file support (.sqldown.toml)
Critical Items from Review:
- Address 2000 column limit with JSON overflow in v0.1 (not deferred)
- Add comprehensive error handling and recovery
- Implement transaction batching for performance
- Add SQL injection protection in filter clauses
- Validate path traversal security in dump
Next Steps:
- Phase 0: Address critical review items (column overflow, error handling)
- Phase 1: Rename markdown-cache → sqldown throughout codebase
- Phase 2: Create package structure (src/sqldown/, pyproject.toml)
- Phase 3: Implement core load/dump functions
- Phase 4: Implement CLI with Click
- Phase 5: Add tests (>80% coverage)
- Phase 6: Update documentation
2025-01-13 - Column Limit Protection Implemented
Session Goals:
- Address Papa Bear's critical review finding: SQLite 2000 column limit
- Implement MVP column limit validation
- Add top-N section extraction feature
Implemented Features:
-
Column Limit Validation:
- Added
--max-columnsflag (default: 1800, leaving safety margin) - Pre-import validation scans all documents and counts unique columns
- Shows detailed breakdown: base/frontmatter/section columns
- Warning at 90% threshold
- Stops import with error if limit exceeded (prevents database corruption)
- Added
-
Top-N Section Extraction:
- Added
--top-sectionsflag (default: 20, 0=extract all) - Analyzes section frequency across entire document collection
- Extracts only the N most common sections as queryable columns
- All other sections preserved in
bodyfield - Reduces column count from 6,694 → 116 for real-world task collection
- Added
Testing Results:
- 5,225 task documents successfully imported
- Without top-sections: Would create 6,694 columns (exceeds 2000 limit)
- With top-20 sections: Creates 116 columns (well within limit)
- Top extracted sections: overview, usage, objective, notes, next_steps, troubleshooting, installation, etc.
Validation:
- ✅ Correctly stops import when limit exceeded
- ✅ Shows warning when approaching limit (90% threshold)
- ✅ Proper error codes (exit 1 on failure, 0 on success)
- ✅ Detailed column breakdown in verbose mode
- ✅ All content preserved in body field
Documentation Updated:
- README.md: Added column limit protection section
- IDEAS.md: Added section whitelisting and body sync concepts
- Commit messages document rationale and implementation
Next Steps:
- Update SPECIFICATION.md to reflect implemented features
- Update SKILL.md with new CLI flags and patterns
- Create taskmaster tasks for ongoing sqldown development
- Consider: section whitelisting (explicit list vs. frequency-based)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sqldown-0.1.0.tar.gz.
File metadata
- Download URL: sqldown-0.1.0.tar.gz
- Upload date:
- Size: 98.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68d55c5f9d66d12683324815112503fcbdcf6b63479b61e14651e2f984c73a80
|
|
| MD5 |
351adfa5b63f9dd92ca50307cf8971ce
|
|
| BLAKE2b-256 |
6e81e4d61288be49bf47239492c79a42c55fdd960feba53b3088afd87f05db50
|
File details
Details for the file sqldown-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sqldown-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.24
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17242a29d121cc475a09cfe1f8df4a4821734a90a66a2cb6adb8eb26e9e37fb9
|
|
| MD5 |
79d5f927dcd8f56bc8af2fcddc3a0ba2
|
|
| BLAKE2b-256 |
efa38b34839812f1426f8c8796f4959c6788ae6562b3e0ce9f977a1150570cbf
|