Smart document-to-Markdown conversion for AI agents
Project description
shuck-file
Feed any document to your AI agent โ in one command.
shuck-file converts documents to clean Markdown for AI agents and LLMs. Small files output directly; large files return a document map with section summaries, token counts, and actionable next steps โ so agents only pull what they need.
Why shuck-file?
AI agents can't read binary documents. They need a bridge that's context-aware:
- Small file โ
shuck report.docxโ full Markdown on stdout - Large file โ
shuck report.docxโ document map with sections and extraction options - Targeted extraction โ
shuck report.docx --sections s1,s3โ only what you need - Search โ
shuck report.docx --grep "revenue"โ find without reading everything
Supported Formats
| Format | Extension | Library | What's Preserved |
|---|---|---|---|
| Word | .docx |
python-docx | Headings, bold/italic, lists, tables |
.pdf |
pdfplumber | Text content, page breaks | |
| Excel | .xlsx |
openpyxl | All sheets as Markdown tables |
| PowerPoint | .pptx |
python-pptx | Titles, text, tables, speaker notes |
| CSV | .csv |
stdlib | All rows/columns as a table |
Installation
Via pip (recommended)
pip install shuck-file
This installs the shuck CLI command and the MCP server.
From source
git clone https://github.com/Shan-Zhu/shuck-file.git
cd shuck-file
pip install -e .
Quick Start
# Convert a document
shuck report.docx
# Force full output (bypass map mode)
shuck large-report.pdf --all
# Search within a document
shuck report.pdf --grep "revenue"
Usage
Auto-Routing (default)
Small files output directly, large files return a document map.
# Small file โ direct Markdown output
shuck document.pdf
# Large file โ document map with sections table + next steps
shuck large-report.pdf
Extraction Options
# Force full output (bypass map mode)
shuck report.pdf --all
# Extract specific sections
shuck report.pdf --sections s1,s3
# Tables only
shuck report.pdf --tables-only
# Search within document
shuck report.pdf --grep "revenue"
# Token budget (smart compression)
shuck report.pdf --budget 4000
# Combinations work
shuck report.pdf --sections s2,s3 --budget 2000
Excel/CSV Specific
# Column headers and types
shuck data.xlsx --schema-only
# Headers + first N rows
shuck data.xlsx --sample 5
Power User Subcommands
# Force map mode (even on small files)
shuck probe document.docx
# Force full extraction (alias for --all)
shuck pull document.docx
Output Control
# Write to file
shuck document.pdf -o output.md
# Write to directory (auto-named)
shuck document.pdf -d ./converted/
# Skip YAML frontmatter
shuck document.pdf --no-frontmatter
# List supported formats
shuck --formats
Map Mode Output
When a file is large, shuck returns a document map:
# Document Map: quarterly-report.pdf
**6 pages | ~12,400 tokens | 6 sections**
## Sections
| # | Title | Type | Tokens | Density |
|---|-------|------|--------|---------|
| s1 | Executive Summary | narrative | 450 | high |
| s2 | Q3 Financial Results | mixed | 2,800 | high |
| s3 | Revenue Breakdown | tabular | 3,200 | high |
| ...
## Next Steps
- `shuck quarterly-report.pdf --all` -- full document (~12,400 tokens)
- `shuck quarterly-report.pdf --sections s1,s2` -- high-density (~3,250 tokens)
- `shuck quarterly-report.pdf --grep "..."` -- search for keywords
MCP Server
shuck-file includes an MCP (Model Context Protocol) server, making it available to any MCP-compatible AI tool.
Claude Code
claude mcp add shuck-file -- shuck-file
Or add to your project's .mcp.json:
{
"mcpServers": {
"shuck-file": {
"command": "shuck-file",
"args": []
}
}
}
Cursor
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"shuck-file": {
"command": "shuck-file",
"args": []
}
}
}
Windsurf
Add to your MCP configuration:
{
"mcpServers": {
"shuck-file": {
"command": "shuck-file",
"args": []
}
}
}
Any MCP Client
shuck-file registers as an MCP server via the mcp.servers entry point. Tools exposed:
shuckโ Convert a document to Markdown with all options (mode, sections, grep, budget, etc.)list_formatsโ List supported document formats
Claude Code Plugin
Install as a Claude Code plugin for the /shuck skill:
claude plugin add /path/to/shuck-file
Architecture
src/shuck_file/
โโโ cli.py # CLI entrypoint
โโโ server.py # MCP Server (FastMCP)
โโโ core/
โ โโโ router.py # Auto-routing logic
โ โโโ segmenter.py # Document segmentation
โ โโโ mapper.py # Map mode renderer
โ โโโ budget.py # Smart compression
โ โโโ grep.py # In-document search
โ โโโ frontmatter.py # YAML frontmatter
โ โโโ models.py # Data models
โโโ extractors/
โ โโโ base.py # Base extractor ABC
โ โโโ docx_ext.py # Word extractor
โ โโโ pdf_ext.py # PDF extractor
โ โโโ xlsx_ext.py # Excel extractor
โ โโโ pptx_ext.py # PowerPoint extractor
โ โโโ csv_ext.py # CSV extractor
plugin/ # Claude Code plugin wrapper
tests/
โโโ test_extractors.py
โโโ test_router.py
โโโ test_segmenter.py
โโโ test_budget.py
โโโ test_grep.py
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file shuck_file-2.0.1.tar.gz.
File metadata
- Download URL: shuck_file-2.0.1.tar.gz
- Upload date:
- Size: 36.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e065b22f62989c225026250b329ea6d62e7fc315a3e2bec61b8e0a318e1c3e2
|
|
| MD5 |
e9a14efd89cc4f9d72d826ba099855c0
|
|
| BLAKE2b-256 |
bb558a733a927ba586c8de6694fc8c9b7bbb35af6e05da70d9601a6b2b87a6c7
|
File details
Details for the file shuck_file-2.0.1-py3-none-any.whl.
File metadata
- Download URL: shuck_file-2.0.1-py3-none-any.whl
- Upload date:
- Size: 23.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e625109b608e1d10a4c095ccdcbf30ff051a4d448981f4fe829594e5f435a209
|
|
| MD5 |
146115259c64eda76c90cdffd92e2ec4
|
|
| BLAKE2b-256 |
767acf098696149557e25b8fb17ac15494cffffb067d9826daf033759e754555
|