Skip to main content

Any file in, Markdown out โ€” read only what matters.

Project description

๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡

shuck-file

Any file in, Markdown out โ€” read only what matters.

shuck-file converts documents to clean Markdown for AI agents and LLMs. Small files output directly; large files return a document map with section summaries, token counts, and actionable next steps โ€” so agents only pull what they need.

Why shuck-file?

AI agents can't read binary documents. They need a bridge that's context-aware:

  • Small file โ†’ shuck report.docx โ†’ full Markdown on stdout
  • Large file โ†’ shuck report.docx โ†’ document map with sections and extraction options
  • Targeted extraction โ†’ shuck report.docx --sections s1,s3 โ†’ only what you need
  • Search โ†’ shuck report.docx --grep "revenue" โ†’ find without reading everything

Supported Formats

Format Extension Library What's Preserved
Word .docx python-docx Headings, bold/italic, lists, tables
PDF .pdf pdfplumber Text content, page breaks
Excel .xlsx openpyxl All sheets as Markdown tables
PowerPoint .pptx python-pptx Titles, text, tables, speaker notes
CSV .csv stdlib All rows/columns as a table

Installation

Via pip (recommended)

pip install shuck-file

This installs the shuck CLI command and the MCP server.

From source

git clone https://github.com/Shan-Zhu/shuck-file.git
cd shuck-file
pip install -e .

Quick Start

# Convert a document
shuck report.docx

# Force full output (bypass map mode)
shuck large-report.pdf --all

# Search within a document
shuck report.pdf --grep "revenue"

Usage

Auto-Routing (default)

Small files output directly, large files return a document map.

# Small file โ†’ direct Markdown output
shuck document.pdf

# Large file โ†’ document map with sections table + next steps
shuck large-report.pdf

Extraction Options

# Force full output (bypass map mode)
shuck report.pdf --all

# Extract specific sections
shuck report.pdf --sections s1,s3

# Tables only
shuck report.pdf --tables-only

# Search within document
shuck report.pdf --grep "revenue"

# Token budget (smart compression)
shuck report.pdf --budget 4000

# Combinations work
shuck report.pdf --sections s2,s3 --budget 2000

Excel/CSV Specific

# Column headers and types
shuck data.xlsx --schema-only

# Headers + first N rows
shuck data.xlsx --sample 5

Power User Subcommands

# Force map mode (even on small files)
shuck probe document.docx

# Force full extraction (alias for --all)
shuck pull document.docx

Output Control

# Write to file
shuck document.pdf -o output.md

# Write to directory (auto-named)
shuck document.pdf -d ./converted/

# Skip YAML frontmatter
shuck document.pdf --no-frontmatter

# List supported formats
shuck --formats

Map Mode Output

When a file is large, shuck returns a document map:

# Document Map: quarterly-report.pdf

**6 pages | ~12,400 tokens | 6 sections**

## Sections

| # | Title | Type | Tokens | Density |
|---|-------|------|--------|---------|
| s1 | Executive Summary | narrative | 450 | high |
| s2 | Q3 Financial Results | mixed | 2,800 | high |
| s3 | Revenue Breakdown | tabular | 3,200 | high |
| ...

## Next Steps

- `shuck quarterly-report.pdf --all` -- full document (~12,400 tokens)
- `shuck quarterly-report.pdf --sections s1,s2` -- high-density (~3,250 tokens)
- `shuck quarterly-report.pdf --grep "..."` -- search for keywords

MCP Server

shuck-file includes an MCP (Model Context Protocol) server, making it available to any MCP-compatible AI tool.

Claude Code

claude mcp add shuck-file -- shuck-file

Or add to your project's .mcp.json:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Windsurf

Add to your MCP configuration:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Any MCP Client

shuck-file registers as an MCP server via the mcp.servers entry point. Tools exposed:

  • shuck โ€” Convert a document to Markdown with all options (mode, sections, grep, budget, etc.)
  • list_formats โ€” List supported document formats

Claude Code Plugin

Install as a Claude Code plugin for the /shuck skill:

claude plugin add /path/to/shuck-file

Architecture

src/shuck_file/
โ”œโ”€โ”€ cli.py                # CLI entrypoint
โ”œโ”€โ”€ server.py             # MCP Server (FastMCP)
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ router.py          # Auto-routing logic
โ”‚   โ”œโ”€โ”€ segmenter.py       # Document segmentation
โ”‚   โ”œโ”€โ”€ mapper.py          # Map mode renderer
โ”‚   โ”œโ”€โ”€ budget.py          # Smart compression
โ”‚   โ”œโ”€โ”€ grep.py            # In-document search
โ”‚   โ”œโ”€โ”€ frontmatter.py     # YAML frontmatter
โ”‚   โ””โ”€โ”€ models.py          # Data models
โ”œโ”€โ”€ extractors/
โ”‚   โ”œโ”€โ”€ base.py            # Base extractor ABC
โ”‚   โ”œโ”€โ”€ docx_ext.py        # Word extractor
โ”‚   โ”œโ”€โ”€ pdf_ext.py         # PDF extractor
โ”‚   โ”œโ”€โ”€ xlsx_ext.py        # Excel extractor
โ”‚   โ”œโ”€โ”€ pptx_ext.py        # PowerPoint extractor
โ”‚   โ””โ”€โ”€ csv_ext.py         # CSV extractor
plugin/                    # Claude Code plugin wrapper
tests/
โ”œโ”€โ”€ test_extractors.py
โ”œโ”€โ”€ test_router.py
โ”œโ”€โ”€ test_segmenter.py
โ”œโ”€โ”€ test_budget.py
โ””โ”€โ”€ test_grep.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shuck_file-2.0.4.tar.gz (37.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shuck_file-2.0.4-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file shuck_file-2.0.4.tar.gz.

File metadata

  • Download URL: shuck_file-2.0.4.tar.gz
  • Upload date:
  • Size: 37.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for shuck_file-2.0.4.tar.gz
Algorithm Hash digest
SHA256 5064c95b8e67ec2dc331ac235834b819a78f370118a0469d218cb4e7ce761f80
MD5 64ba2ce7a8bbbe8d348cd7f0da7af8e7
BLAKE2b-256 b399d13a93b15590e0a64cc8d67a479bc8473c326bd4640539951a3d79b816ea

See more details on using hashes here.

File details

Details for the file shuck_file-2.0.4-py3-none-any.whl.

File metadata

  • Download URL: shuck_file-2.0.4-py3-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for shuck_file-2.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 279e8f563d9e700f8b718904f2c41b7ca94e5891e6b47d8919862e638e6f3268
MD5 5c1f5d827a0870521bb80c24bdfa0fd4
BLAKE2b-256 15e5ae8fb48d72124eaf7d2b776f7a15d93e4ec3980cf0436c736b206a016c60

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page