Skip to main content

Smart document-to-Markdown conversion for AI agents

Project description

๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡

shuck-file

Feed any document to your AI agent โ€” in one command.

shuck-file converts documents to clean Markdown for AI agents and LLMs. Small files output directly; large files return a document map with section summaries, token counts, and actionable next steps โ€” so agents only pull what they need.

Why shuck-file?

AI agents can't read binary documents. They need a bridge that's context-aware:

  • Small file โ†’ shuck report.docx โ†’ full Markdown on stdout
  • Large file โ†’ shuck report.docx โ†’ document map with sections and extraction options
  • Targeted extraction โ†’ shuck report.docx --sections s1,s3 โ†’ only what you need
  • Search โ†’ shuck report.docx --grep "revenue" โ†’ find without reading everything

Supported Formats

Format Extension Library What's Preserved
Word .docx python-docx Headings, bold/italic, lists, tables
PDF .pdf pdfplumber Text content, page breaks
Excel .xlsx openpyxl All sheets as Markdown tables
PowerPoint .pptx python-pptx Titles, text, tables, speaker notes
CSV .csv stdlib All rows/columns as a table

Installation

Via pip (recommended)

pip install shuck-file

This installs the shuck CLI command and the MCP server.

From source

git clone https://github.com/Shan-Zhu/shuck-file.git
cd shuck-file
pip install -e .

Quick Start

# Convert a document
shuck report.docx

# Force full output (bypass map mode)
shuck large-report.pdf --all

# Search within a document
shuck report.pdf --grep "revenue"

Usage

Auto-Routing (default)

Small files output directly, large files return a document map.

# Small file โ†’ direct Markdown output
shuck document.pdf

# Large file โ†’ document map with sections table + next steps
shuck large-report.pdf

Extraction Options

# Force full output (bypass map mode)
shuck report.pdf --all

# Extract specific sections
shuck report.pdf --sections s1,s3

# Tables only
shuck report.pdf --tables-only

# Search within document
shuck report.pdf --grep "revenue"

# Token budget (smart compression)
shuck report.pdf --budget 4000

# Combinations work
shuck report.pdf --sections s2,s3 --budget 2000

Excel/CSV Specific

# Column headers and types
shuck data.xlsx --schema-only

# Headers + first N rows
shuck data.xlsx --sample 5

Power User Subcommands

# Force map mode (even on small files)
shuck probe document.docx

# Force full extraction (alias for --all)
shuck pull document.docx

Output Control

# Write to file
shuck document.pdf -o output.md

# Write to directory (auto-named)
shuck document.pdf -d ./converted/

# Skip YAML frontmatter
shuck document.pdf --no-frontmatter

# List supported formats
shuck --formats

Map Mode Output

When a file is large, shuck returns a document map:

# Document Map: quarterly-report.pdf

**6 pages | ~12,400 tokens | 6 sections**

## Sections

| # | Title | Type | Tokens | Density |
|---|-------|------|--------|---------|
| s1 | Executive Summary | narrative | 450 | high |
| s2 | Q3 Financial Results | mixed | 2,800 | high |
| s3 | Revenue Breakdown | tabular | 3,200 | high |
| ...

## Next Steps

- `shuck quarterly-report.pdf --all` -- full document (~12,400 tokens)
- `shuck quarterly-report.pdf --sections s1,s2` -- high-density (~3,250 tokens)
- `shuck quarterly-report.pdf --grep "..."` -- search for keywords

MCP Server

shuck-file includes an MCP (Model Context Protocol) server, making it available to any MCP-compatible AI tool.

Claude Code

claude mcp add shuck-file -- shuck-file

Or add to your project's .mcp.json:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Windsurf

Add to your MCP configuration:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Any MCP Client

shuck-file registers as an MCP server via the mcp.servers entry point. Tools exposed:

  • shuck โ€” Convert a document to Markdown with all options (mode, sections, grep, budget, etc.)
  • list_formats โ€” List supported document formats

Claude Code Plugin

Install as a Claude Code plugin for the /shuck skill:

claude plugin add /path/to/shuck-file

Architecture

src/shuck_file/
โ”œโ”€โ”€ cli.py                # CLI entrypoint
โ”œโ”€โ”€ server.py             # MCP Server (FastMCP)
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ router.py          # Auto-routing logic
โ”‚   โ”œโ”€โ”€ segmenter.py       # Document segmentation
โ”‚   โ”œโ”€โ”€ mapper.py          # Map mode renderer
โ”‚   โ”œโ”€โ”€ budget.py          # Smart compression
โ”‚   โ”œโ”€โ”€ grep.py            # In-document search
โ”‚   โ”œโ”€โ”€ frontmatter.py     # YAML frontmatter
โ”‚   โ””โ”€โ”€ models.py          # Data models
โ”œโ”€โ”€ extractors/
โ”‚   โ”œโ”€โ”€ base.py            # Base extractor ABC
โ”‚   โ”œโ”€โ”€ docx_ext.py        # Word extractor
โ”‚   โ”œโ”€โ”€ pdf_ext.py         # PDF extractor
โ”‚   โ”œโ”€โ”€ xlsx_ext.py        # Excel extractor
โ”‚   โ”œโ”€โ”€ pptx_ext.py        # PowerPoint extractor
โ”‚   โ””โ”€โ”€ csv_ext.py         # CSV extractor
plugin/                    # Claude Code plugin wrapper
tests/
โ”œโ”€โ”€ test_extractors.py
โ”œโ”€โ”€ test_router.py
โ”œโ”€โ”€ test_segmenter.py
โ”œโ”€โ”€ test_budget.py
โ””โ”€โ”€ test_grep.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shuck_file-2.0.1.tar.gz (36.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shuck_file-2.0.1-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file shuck_file-2.0.1.tar.gz.

File metadata

  • Download URL: shuck_file-2.0.1.tar.gz
  • Upload date:
  • Size: 36.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for shuck_file-2.0.1.tar.gz
Algorithm Hash digest
SHA256 8e065b22f62989c225026250b329ea6d62e7fc315a3e2bec61b8e0a318e1c3e2
MD5 e9a14efd89cc4f9d72d826ba099855c0
BLAKE2b-256 bb558a733a927ba586c8de6694fc8c9b7bbb35af6e05da70d9601a6b2b87a6c7

See more details on using hashes here.

File details

Details for the file shuck_file-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: shuck_file-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for shuck_file-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e625109b608e1d10a4c095ccdcbf30ff051a4d448981f4fe829594e5f435a209
MD5 146115259c64eda76c90cdffd92e2ec4
BLAKE2b-256 767acf098696149557e25b8fb17ac15494cffffb067d9826daf033759e754555

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page