Skip to main content

CLI for Reducto document processing

Project description

Reducto CLI

PyPI version Python 3.11+ License

A command-line tool for document parsing, structured data extraction, and document editing — powered by Reducto's document intelligence API.

Parse PDFs, images, spreadsheets, and Office documents into clean Markdown. Extract structured JSON using schemas. Edit documents with natural language instructions. Process single files or entire directories.

Documentation | Reducto Studio | API Quickstart | Python SDK | Claude Code Plugin


Table of Contents


Installation

pip install reducto-cli

Requires Python 3.11 or later.

Authentication

Authenticate using the built-in device code flow, which opens a browser to Reducto Studio:

reducto login

This saves your API key to ~/.reducto/config.yaml.

Alternatively, set the REDUCTO_API_KEY environment variable directly:

export REDUCTO_API_KEY="your_api_key_here"

Get an API key by signing up at studio.reducto.ai.

Quick Start

# Parse a PDF into Markdown
reducto parse invoice.pdf

# Parse an entire folder of documents
reducto parse ./contracts/

# Extract structured data using a JSON Schema
reducto extract invoice.pdf -s schema.json

# Edit a document with natural language
reducto edit form.pdf -i "Fill in the client name as 'Acme Corp'"

Commands

Parse Command

Converts documents into structured Markdown, preserving layout, tables, and figures. Uses Reducto's Parse API with agentic OCR and vision-language models.

reducto parse <path> [options]

Output is written to <filename>.parse.md with YAML front matter containing the job ID and processing duration.

Options

Flag Description
--agentic Enables agentic processing for tables, text, and figures. Higher accuracy, higher latency. Use for complex layouts or low-quality scans.
--change-tracking Returns <s>, <u>, and <change> tags for strikethrough, underlined, and revised text. Useful for contracts and legal redlines.
--highlights Include highlighted text in output.
--hyperlinks Include embedded hyperlinks in output.
--comments Include document comments in output.

Examples

# Basic parse
reducto parse document.pdf

# High-accuracy parse for complex layouts
reducto parse scanned_report.pdf --agentic

# Parse a contract with revision tracking
reducto parse contract.pdf --change-tracking

# Parse with all metadata preserved
reducto parse document.pdf --hyperlinks --comments --highlights

# Combine flags
reducto parse legal_doc.pdf --agentic --change-tracking --comments

Extract Command

Pulls structured data from documents according to a JSON Schema you provide. Maps unstructured content — invoices, receipts, forms, contracts, financial statements — into machine-readable JSON.

reducto extract <path> --schema <schema>

The schema can be a path to a .json file or an inline JSON string. Output is saved as <filename>.extract.json.

The CLI automatically reuses existing parse results: if a .parse.md file exists for a document, its recorded job ID is used via jobid:// references to skip re-parsing.

Schema Requirements

  • Must be a valid JSON Schema document.
  • The top-level type must be object — arrays and primitives are not permitted at the top level.
  • Schemas can be provided as file paths or inline JSON strings.

Example Schema

{
  "type": "object",
  "properties": {
    "vendor_name": { "type": "string" },
    "invoice_number": { "type": "string" },
    "date": { "type": "string" },
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "description": { "type": "string" },
          "quantity": { "type": "number" },
          "unit_price": { "type": "number" },
          "total": { "type": "number" }
        },
        "required": ["description", "quantity", "unit_price", "total"]
      }
    },
    "total_amount": { "type": "number" }
  },
  "required": ["vendor_name", "invoice_number", "line_items", "total_amount"]
}

Examples

# Extract using a schema file
reducto extract invoice.pdf -s schemas/invoice.json

# Extract from a folder of invoices
reducto extract ./invoices/ -s schemas/invoice.json

# Extract with inline JSON schema
reducto extract receipt.pdf -s '{"type":"object","properties":{"total":{"type":"number"},"date":{"type":"string"}},"required":["total","date"]}'

Edit Command

Modifies documents using natural language instructions. Uploads the document, applies edits via the Reducto Edit API, and downloads the result.

reducto edit <path> --instructions "<instructions>"

Edited files are saved as <filename>.edited.<extension> (e.g., form.pdf becomes form.edited.pdf).

Parameter Required Description
path Yes Path to a file or directory.
--instructions, -i Yes Natural language instructions for the edits.

Examples

# Fill out a PDF form
reducto edit application.pdf -i "Fill in: Name: Jane Smith, Date: 2025-03-15, check 'Agree to terms'"

# Update a contract
reducto edit contract.pdf -i "Fill in the client name as 'Acme Corporation' and set the effective date to January 15, 2025"

# Batch edit a folder of forms
reducto edit ./forms/ -i "Set the company name to 'Globex Inc' in all header fields"

Tips for Effective Instructions

  • Be specific about which elements to modify (headers, tables, specific fields).
  • Reference content by name or position when possible.
  • Describe the desired outcome, not the process.
  • For batch operations, write instructions that apply uniformly across all files.

Supported File Types

Category Extensions
PDF .pdf
Images .png, .jpg, .jpeg
Office Documents .doc, .docx, .ppt, .pptx
Spreadsheets .xls, .xlsx, .numbers

All commands accept a single file or a directory. Directories are scanned recursively and only supported file types are processed. Generated output files (.parse.md, .extract.json) are automatically excluded from processing.


Use Cases

Invoice and Receipt Processing

Parse invoices from any vendor format, then extract line items, totals, and payment details into structured JSON for your accounting pipeline.

reducto parse ./invoices/
reducto extract ./invoices/ -s schemas/invoice.json

Contract and Legal Document Review

Parse contracts with change tracking to surface redlines and revisions. Extract key clauses, dates, and party names for contract management systems.

reducto parse contract.pdf --agentic --change-tracking --comments
reducto extract contract.pdf -s schemas/contract_terms.json

Form Processing and Auto-Fill

Edit PDF and DOCX forms programmatically — fill fields, check boxes, and populate tables without manual data entry.

reducto edit onboarding_form.pdf -i "Fill in employee name: Alex Chen, start date: 2025-04-01, department: Engineering, select 'Full-time' for employment type"

Financial Statement Analysis

Extract tables and figures from bank statements, earnings reports, and tax documents into structured data for financial modeling.

reducto extract quarterly_report.pdf -s schemas/financial_statement.json

Medical and Insurance Document Processing

Parse lab reports, claims forms, and patient intake documents. Reducto is HIPAA compliant for healthcare workflows.

reducto parse lab_results.pdf --agentic
reducto extract claim_form.pdf -s schemas/insurance_claim.json

Batch Document Digitization

Convert entire folders of scanned documents, presentations, and spreadsheets into searchable Markdown for knowledge bases or RAG pipelines.

reducto parse ./legacy_docs/ --agentic

Feeding Data to LLM Pipelines

Parse documents into clean Markdown optimized for LLM consumption, then use the structured output as context for retrieval-augmented generation (RAG) systems.

# Parse into LLM-ready Markdown
reducto parse ./knowledge_base/

# Or extract specific fields for structured RAG
reducto extract ./knowledge_base/ -s schemas/document_metadata.json

How It Works

  1. Upload — The CLI uploads your document to Reducto's API.
  2. Process — Reducto applies agentic OCR, layout detection, and vision-language models to understand document structure.
  3. Return — Parsed Markdown, extracted JSON, or edited documents are downloaded to your local filesystem.

Files within a directory are processed concurrently. Parse results are cached locally (.parse.md files with job IDs), so subsequent extract commands skip re-parsing.


Configuration

Method Details
Device code login reducto login — opens browser, saves key to ~/.reducto/config.yaml
Environment variable export REDUCTO_API_KEY="your_key" — takes precedence over saved config
Manual entry The CLI prompts for manual key entry as a fallback

The config file is stored at ~/.reducto/config.yaml with 0600 permissions.


Related Projects

Project Description
Reducto Python SDK Full Python client for the Reducto API (pip install reductoai)
Reducto Node.js SDK Node.js client for the Reducto API (npm install reductoai)
Reducto Go SDK Go client for the Reducto API
Reducto Claude Code Plugins Official Reducto plugins for Claude Code
Reducto Studio No-code web interface for document processing

Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reducto_cli-0.1.4.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reducto_cli-0.1.4-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file reducto_cli-0.1.4.tar.gz.

File metadata

  • Download URL: reducto_cli-0.1.4.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reducto_cli-0.1.4.tar.gz
Algorithm Hash digest
SHA256 17ff9e614fe70461cc923cb495f7cb05c85745c3eb8d15520d6f50058a7a793a
MD5 d20d5379a8165d5c93b9c3bf44743b54
BLAKE2b-256 eae3c7ff147c6adbbc22bee29e26225115e63c318981be30cdaa50ef056aff1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for reducto_cli-0.1.4.tar.gz:

Publisher: publish.yml on reductoai/cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file reducto_cli-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: reducto_cli-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for reducto_cli-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 60e492aeca96fbc23e7af72339c744e9ca4c018b5eebc8f05ffac846cdd8c0c7
MD5 98c23b7ce8d44376bca351d15641e8e3
BLAKE2b-256 893511f27e62eee046f3c590dcb5f23be9679565d1eb7673ac77aa7f6bbeed37

See more details on using hashes here.

Provenance

The following attestation bundles were made for reducto_cli-0.1.4-py3-none-any.whl:

Publisher: publish.yml on reductoai/cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page