Skip to main content

AI doesn't eat shells

Project description

๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡

shuck-file

Feed any document to your AI agent โ€” in one command.

shuck-file converts documents to clean Markdown for AI agents and LLMs. Small files output directly; large files return a document map with section summaries, token counts, and actionable next steps โ€” so agents only pull what they need.

Why shuck-file?

AI agents can't read binary documents. They need a bridge that's context-aware:

  • Small file โ†’ shuck report.docx โ†’ full Markdown on stdout
  • Large file โ†’ shuck report.docx โ†’ document map with sections and extraction options
  • Targeted extraction โ†’ shuck report.docx --sections s1,s3 โ†’ only what you need
  • Search โ†’ shuck report.docx --grep "revenue" โ†’ find without reading everything

Supported Formats

Format Extension Library What's Preserved
Word .docx python-docx Headings, bold/italic, lists, tables
PDF .pdf pdfplumber Text content, page breaks
Excel .xlsx openpyxl All sheets as Markdown tables
PowerPoint .pptx python-pptx Titles, text, tables, speaker notes
CSV .csv stdlib All rows/columns as a table

Installation

Via pip (recommended)

pip install shuck-file

This installs the shuck CLI command and the MCP server.

From source

git clone https://github.com/Shan-Zhu/shuck-file.git
cd shuck-file
pip install -e .

Quick Start

# Convert a document
shuck report.docx

# Force full output (bypass map mode)
shuck large-report.pdf --all

# Search within a document
shuck report.pdf --grep "revenue"

Usage

Auto-Routing (default)

Small files output directly, large files return a document map.

# Small file โ†’ direct Markdown output
shuck document.pdf

# Large file โ†’ document map with sections table + next steps
shuck large-report.pdf

Extraction Options

# Force full output (bypass map mode)
shuck report.pdf --all

# Extract specific sections
shuck report.pdf --sections s1,s3

# Tables only
shuck report.pdf --tables-only

# Search within document
shuck report.pdf --grep "revenue"

# Token budget (smart compression)
shuck report.pdf --budget 4000

# Combinations work
shuck report.pdf --sections s2,s3 --budget 2000

Excel/CSV Specific

# Column headers and types
shuck data.xlsx --schema-only

# Headers + first N rows
shuck data.xlsx --sample 5

Power User Subcommands

# Force map mode (even on small files)
shuck probe document.docx

# Force full extraction (alias for --all)
shuck pull document.docx

Output Control

# Write to file
shuck document.pdf -o output.md

# Write to directory (auto-named)
shuck document.pdf -d ./converted/

# Skip YAML frontmatter
shuck document.pdf --no-frontmatter

# List supported formats
shuck --formats

Map Mode Output

When a file is large, shuck returns a document map:

# Document Map: quarterly-report.pdf

**6 pages | ~12,400 tokens | 6 sections**

## Sections

| # | Title | Type | Tokens | Density |
|---|-------|------|--------|---------|
| s1 | Executive Summary | narrative | 450 | high |
| s2 | Q3 Financial Results | mixed | 2,800 | high |
| s3 | Revenue Breakdown | tabular | 3,200 | high |
| ...

## Next Steps

- `shuck quarterly-report.pdf --all` -- full document (~12,400 tokens)
- `shuck quarterly-report.pdf --sections s1,s2` -- high-density (~3,250 tokens)
- `shuck quarterly-report.pdf --grep "..."` -- search for keywords

MCP Server

shuck-file includes an MCP (Model Context Protocol) server, making it available to any MCP-compatible AI tool.

Claude Code

claude mcp add shuck-file -- shuck-file

Or add to your project's .mcp.json:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Windsurf

Add to your MCP configuration:

{
  "mcpServers": {
    "shuck-file": {
      "command": "shuck-file",
      "args": []
    }
  }
}

Any MCP Client

shuck-file registers as an MCP server via the mcp.servers entry point. Tools exposed:

  • shuck โ€” Convert a document to Markdown with all options (mode, sections, grep, budget, etc.)
  • list_formats โ€” List supported document formats

Claude Code Plugin

Install as a Claude Code plugin for the /shuck skill:

claude plugin add /path/to/shuck-file

Architecture

src/shuck_file/
โ”œโ”€โ”€ cli.py                # CLI entrypoint
โ”œโ”€โ”€ server.py             # MCP Server (FastMCP)
โ”œโ”€โ”€ core/
โ”‚   โ”œโ”€โ”€ router.py          # Auto-routing logic
โ”‚   โ”œโ”€โ”€ segmenter.py       # Document segmentation
โ”‚   โ”œโ”€โ”€ mapper.py          # Map mode renderer
โ”‚   โ”œโ”€โ”€ budget.py          # Smart compression
โ”‚   โ”œโ”€โ”€ grep.py            # In-document search
โ”‚   โ”œโ”€โ”€ frontmatter.py     # YAML frontmatter
โ”‚   โ””โ”€โ”€ models.py          # Data models
โ”œโ”€โ”€ extractors/
โ”‚   โ”œโ”€โ”€ base.py            # Base extractor ABC
โ”‚   โ”œโ”€โ”€ docx_ext.py        # Word extractor
โ”‚   โ”œโ”€โ”€ pdf_ext.py         # PDF extractor
โ”‚   โ”œโ”€โ”€ xlsx_ext.py        # Excel extractor
โ”‚   โ”œโ”€โ”€ pptx_ext.py        # PowerPoint extractor
โ”‚   โ””โ”€โ”€ csv_ext.py         # CSV extractor
plugin/                    # Claude Code plugin wrapper
tests/
โ”œโ”€โ”€ test_extractors.py
โ”œโ”€โ”€ test_router.py
โ”œโ”€โ”€ test_segmenter.py
โ”œโ”€โ”€ test_budget.py
โ””โ”€โ”€ test_grep.py

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shuck_file-2.0.2.tar.gz (36.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

shuck_file-2.0.2-py3-none-any.whl (23.7 kB view details)

Uploaded Python 3

File details

Details for the file shuck_file-2.0.2.tar.gz.

File metadata

  • Download URL: shuck_file-2.0.2.tar.gz
  • Upload date:
  • Size: 36.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for shuck_file-2.0.2.tar.gz
Algorithm Hash digest
SHA256 33c151fab054527d47eb2b1f2b66af2efd696f7900801d46439e0ab8ee28b1ae
MD5 c46da2af956a4bca22579b0b4673353d
BLAKE2b-256 ae8535f0bf3bbd2c13d4ddf26e88f06a727215eb9a6b58e58141be6e54efb0e1

See more details on using hashes here.

File details

Details for the file shuck_file-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: shuck_file-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 23.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for shuck_file-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f16dbd82b45aeb5412942b92dc6c4ed27b011c7edce76282502949568d75a660
MD5 43da4ec7ef42f71576f2f12817497aa4
BLAKE2b-256 88349a205fe75eaf5cbc9e43d422abe11bc25228db0c53f27f3785a2d83ed1fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page