Skip to main content

AI doesn't eat shells

Project description

๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡

shuck-file

Feed any document to your AI agent โ€” in one command.

shuck-file converts documents to clean Markdown for AI agents and LLMs. Small files output directly; large files return a document map with section summaries, token counts, and actionable next steps โ€” so agents only pull what they need.

Why shuck-file?

AI agents can't read binary documents. They need a bridge that's context-aware:

  • Small file โ†’ shuck report.docx โ†’ full Markdown on stdout
  • Large file โ†’ shuck report.docx โ†’ document map with sections and extraction options
  • Targeted extraction โ†’ shuck report.docx --sections s1,s3 โ†’ only what you need
  • Search โ†’ shuck report.docx --grep "revenue" โ†’ find without reading everything

Supported Formats

Format Extension Library What's Preserved
Word .docx python-docx Headings, bold/italic, lists, tables
PDF .pdf pdfplumber Text content, page breaks
Excel .xlsx openpyxl All sheets as Markdown tables
PowerPoint .pptx python-pptx Titles, text, tables, speaker notes
CSV .csv stdlib All rows/columns as a table

Installation

Via pip (recommended)

pip install shuck-file

This installs the shuck CLI command and the MCP server.

From source

git clone https://github.com/Shan-Zhu/shuck-file.git
cd shuck-file
pip install -e .

Quick Start

# Convert a document
shuck report.docx

# Force full output (bypass map mode)
shuck large-report.pdf --all

# Search within a document
shuck report.pdf --grep "revenue"

Usage

Auto-Routing (default)

Small files output directly, large files return a document map.

# Small file โ†’ direct Markdown output
shuck document.pdf

# Large file โ†’ document map with sections table + next steps
shuck large-report.pdf

Extraction Options

# Force full output (bypass map mode)
shuck report.pdf --all

# Extract specific sections
shuck report.pdf --sections s1,s3

# Tables only
shuck report.pdf --tables-only

# Search within document
shuck report.pdf --grep "revenue"

# Token budget (smart compression)
shuck report.pdf --budget 4000

# Combinations work
shuck report.pdf --sections s2,s3 --budget 2000

Excel/CSV Specific

# Column headers and types
shuck data.xlsx --schema-only

# Headers + first N rows
shuck data.xlsx --sample 5

Power User Subcommands

# Force map mode (even on small files)
shuck probe document.docx

# Force full extraction (alias for --all)
shuck pull document.docx

Output Control

# Write to file
shuck document.pdf -o output.md

# Write to directory (auto-named)
shuck document.pdf -d ./converted/

# Skip YAML frontmatter
shuck document.pdf --no-frontmatter

# List supported formats
shuck --formats

Map Mode Output

When a file is large, shuck returns a document map:

# Document Map: quarterly-report.pdf

**6 pages | ~12,400 tokens | 6 sections**

## Sections

| # | Title | Type | Tokens | Density |
|---|-------|------|--------|---------|
| s1 | Executive Summary | narrative | 450 | high |
| s2 | Q3 Financial Results | mixed | 2,800 | high |
| s3 | Revenue Breakdown | tabular | 3,200 | high |
| ...

## Next Steps

- `shuck quarterly-report.pdf --all` -- full document (~12,400 tokens)
- `shuck quarterly-report.pdf --sections s1,s2` -- high-density (~3,250 tokens)
- `shuck quarterly-report.pdf --grep "..."` -- search for keywords

MCP Server

shuck-file includes an MCP (Model Context Protocol) server, making it available to any MCP-compatible AI tool.

Claude Code

claude mcp add shuck-file -- shuck-file

Or add to your project's .mcp.json:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Windsurf

Add to your MCP configuration:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Any MCP Client

shuck-file registers as an MCP server via the mcp.servers entry point. Tools exposed:

  • shuck โ€” Convert a document to Markdown with all options (mode, sections, grep, budget, etc.)
  • list_formats โ€” List supported document formats

Claude Code Plugin

Install as a Claude Code plugin for the /shuck skill:

claude plugin add /path/to/shuck-file

Architecture

src/shuck_file/
โ”œโ”€โ”€ cli.py                # CLI entrypoint
โ”œโ”€โ”€ server.py             # MCP Server (FastMCP)
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ router.py          # Auto-routing logic
โ”‚   โ”œโ”€โ”€ segmenter.py       # Document segmentation
โ”‚   โ”œโ”€โ”€ mapper.py          # Map mode renderer
โ”‚   โ”œโ”€โ”€ budget.py          # Smart compression
โ”‚   โ”œโ”€โ”€ grep.py            # In-document search
โ”‚   โ”œโ”€โ”€ frontmatter.py     # YAML frontmatter
โ”‚   โ””โ”€โ”€ models.py          # Data models
โ”œโ”€โ”€ extractors/
โ”‚   โ”œโ”€โ”€ base.py            # Base extractor ABC
โ”‚   โ”œโ”€โ”€ docx_ext.py        # Word extractor
โ”‚   โ”œโ”€โ”€ pdf_ext.py         # PDF extractor
โ”‚   โ”œโ”€โ”€ xlsx_ext.py        # Excel extractor
โ”‚   โ”œโ”€โ”€ pptx_ext.py        # PowerPoint extractor
โ”‚   โ””โ”€โ”€ csv_ext.py         # CSV extractor
plugin/                    # Claude Code plugin wrapper
tests/
โ”œโ”€โ”€ test_extractors.py
โ”œโ”€โ”€ test_router.py
โ”œโ”€โ”€ test_segmenter.py
โ”œโ”€โ”€ test_budget.py
โ””โ”€โ”€ test_grep.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shuck_file-2.0.3.tar.gz (37.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shuck_file-2.0.3-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file shuck_file-2.0.3.tar.gz.

File metadata

  • Download URL: shuck_file-2.0.3.tar.gz
  • Upload date:
  • Size: 37.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for shuck_file-2.0.3.tar.gz
Algorithm Hash digest
SHA256 66d94fa70541ad6e5159eec734a6c9ab8ce6b023b32b4f155465ef233d93ef4f
MD5 545bbad59c4702e83215864158e20178
BLAKE2b-256 f4ce85ee43096e60da039f01cc5f4df86b61909af547e82e9442cd603dd7f98a

See more details on using hashes here.

File details

Details for the file shuck_file-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: shuck_file-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for shuck_file-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a64186862b4018a8a36a00802c40b1d1e0f5d50fa1548ca1cd2f255e5530de1d
MD5 1e9c5e24f394ac146249c0e2fbd0ff49
BLAKE2b-256 8cc80de85d6b5e2036fcd68620fe8f6e9cafbac951f6dde20bed366b3e50064d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page