CLI for Reducto document processing
Project description
Reducto CLI
A command-line tool for document parsing, structured data extraction, and document editing — powered by Reducto's document intelligence API.
Parse PDFs, images, spreadsheets, and Office documents into clean Markdown. Extract structured JSON using schemas. Edit documents with natural language instructions. Process single files or entire directories.
Documentation | Reducto Studio | API Quickstart | Python SDK | Claude Code Plugin
Table of Contents
- Installation
- Authentication
- Quick Start
- Commands
- Supported File Types
- Use Cases
- How It Works
- Configuration
- Related Projects
Installation
pip install reducto-cli
Requires Python 3.11 or later.
Authentication
Authenticate using the built-in device code flow, which opens a browser to Reducto Studio:
reducto login
This saves your API key to ~/.reducto/config.yaml.
Alternatively, set the REDUCTO_API_KEY environment variable directly:
export REDUCTO_API_KEY="your_api_key_here"
Get an API key by signing up at studio.reducto.ai.
Quick Start
# Parse a PDF into Markdown
reducto parse invoice.pdf
# Parse an entire folder of documents
reducto parse ./contracts/
# Extract structured data using a JSON Schema
reducto extract invoice.pdf -s schema.json
# Edit a document with natural language
reducto edit form.pdf -i "Fill in the client name as 'Acme Corp'"
Commands
Parse Command
Converts documents into structured Markdown, preserving layout, tables, and figures. Uses Reducto's Parse API with agentic OCR and vision-language models.
reducto parse <path> [options]
Output is written to <filename>.parse.md with YAML front matter containing the job ID and processing duration.
Options
| Flag | Description |
|---|---|
--agentic |
Enables agentic processing for tables, text, and figures. Higher accuracy, higher latency. Use for complex layouts or low-quality scans. |
--change-tracking |
Returns <s>, <u>, and <change> tags for strikethrough, underlined, and revised text. Useful for contracts and legal redlines. |
--highlights |
Include highlighted text in output. |
--hyperlinks |
Include embedded hyperlinks in output. |
--comments |
Include document comments in output. |
Examples
# Basic parse
reducto parse document.pdf
# High-accuracy parse for complex layouts
reducto parse scanned_report.pdf --agentic
# Parse a contract with revision tracking
reducto parse contract.pdf --change-tracking
# Parse with all metadata preserved
reducto parse document.pdf --hyperlinks --comments --highlights
# Combine flags
reducto parse legal_doc.pdf --agentic --change-tracking --comments
Extract Command
Pulls structured data from documents according to a JSON Schema you provide. Maps unstructured content — invoices, receipts, forms, contracts, financial statements — into machine-readable JSON.
reducto extract <path> --schema <schema>
The schema can be a path to a .json file or an inline JSON string. Output is saved as <filename>.extract.json.
The CLI automatically reuses existing parse results: if a .parse.md file exists for a document, its recorded job ID is used via jobid:// references to skip re-parsing.
Schema Requirements
- Must be a valid JSON Schema document.
- The top-level type must be
object— arrays and primitives are not permitted at the top level. - Schemas can be provided as file paths or inline JSON strings.
Example Schema
{
"type": "object",
"properties": {
"vendor_name": { "type": "string" },
"invoice_number": { "type": "string" },
"date": { "type": "string" },
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"quantity": { "type": "number" },
"unit_price": { "type": "number" },
"total": { "type": "number" }
},
"required": ["description", "quantity", "unit_price", "total"]
}
},
"total_amount": { "type": "number" }
},
"required": ["vendor_name", "invoice_number", "line_items", "total_amount"]
}
Examples
# Extract using a schema file
reducto extract invoice.pdf -s schemas/invoice.json
# Extract from a folder of invoices
reducto extract ./invoices/ -s schemas/invoice.json
# Extract with inline JSON schema
reducto extract receipt.pdf -s '{"type":"object","properties":{"total":{"type":"number"},"date":{"type":"string"}},"required":["total","date"]}'
Edit Command
Modifies documents using natural language instructions. Uploads the document, applies edits via the Reducto Edit API, and downloads the result.
reducto edit <path> --instructions "<instructions>"
Edited files are saved as <filename>.edited.<extension> (e.g., form.pdf becomes form.edited.pdf).
| Parameter | Required | Description |
|---|---|---|
path |
Yes | Path to a file or directory. |
--instructions, -i |
Yes | Natural language instructions for the edits. |
Examples
# Fill out a PDF form
reducto edit application.pdf -i "Fill in: Name: Jane Smith, Date: 2025-03-15, check 'Agree to terms'"
# Update a contract
reducto edit contract.pdf -i "Fill in the client name as 'Acme Corporation' and set the effective date to January 15, 2025"
# Batch edit a folder of forms
reducto edit ./forms/ -i "Set the company name to 'Globex Inc' in all header fields"
Tips for Effective Instructions
- Be specific about which elements to modify (headers, tables, specific fields).
- Reference content by name or position when possible.
- Describe the desired outcome, not the process.
- For batch operations, write instructions that apply uniformly across all files.
Supported File Types
| Category | Extensions |
|---|---|
.pdf |
|
| Images | .png, .jpg, .jpeg |
| Office Documents | .doc, .docx, .ppt, .pptx |
| Spreadsheets | .xls, .xlsx, .numbers |
All commands accept a single file or a directory. Directories are scanned recursively and only supported file types are processed. Generated output files (.parse.md, .extract.json) are automatically excluded from processing.
Use Cases
Invoice and Receipt Processing
Parse invoices from any vendor format, then extract line items, totals, and payment details into structured JSON for your accounting pipeline.
reducto parse ./invoices/
reducto extract ./invoices/ -s schemas/invoice.json
Contract and Legal Document Review
Parse contracts with change tracking to surface redlines and revisions. Extract key clauses, dates, and party names for contract management systems.
reducto parse contract.pdf --agentic --change-tracking --comments
reducto extract contract.pdf -s schemas/contract_terms.json
Form Processing and Auto-Fill
Edit PDF and DOCX forms programmatically — fill fields, check boxes, and populate tables without manual data entry.
reducto edit onboarding_form.pdf -i "Fill in employee name: Alex Chen, start date: 2025-04-01, department: Engineering, select 'Full-time' for employment type"
Financial Statement Analysis
Extract tables and figures from bank statements, earnings reports, and tax documents into structured data for financial modeling.
reducto extract quarterly_report.pdf -s schemas/financial_statement.json
Medical and Insurance Document Processing
Parse lab reports, claims forms, and patient intake documents. Reducto is HIPAA compliant for healthcare workflows.
reducto parse lab_results.pdf --agentic
reducto extract claim_form.pdf -s schemas/insurance_claim.json
Batch Document Digitization
Convert entire folders of scanned documents, presentations, and spreadsheets into searchable Markdown for knowledge bases or RAG pipelines.
reducto parse ./legacy_docs/ --agentic
Feeding Data to LLM Pipelines
Parse documents into clean Markdown optimized for LLM consumption, then use the structured output as context for retrieval-augmented generation (RAG) systems.
# Parse into LLM-ready Markdown
reducto parse ./knowledge_base/
# Or extract specific fields for structured RAG
reducto extract ./knowledge_base/ -s schemas/document_metadata.json
How It Works
- Upload — The CLI uploads your document to Reducto's API.
- Process — Reducto applies agentic OCR, layout detection, and vision-language models to understand document structure.
- Return — Parsed Markdown, extracted JSON, or edited documents are downloaded to your local filesystem.
Files within a directory are processed concurrently. Parse results are cached locally (.parse.md files with job IDs), so subsequent extract commands skip re-parsing.
Configuration
| Method | Details |
|---|---|
| Device code login | reducto login — opens browser, saves key to ~/.reducto/config.yaml |
| Environment variable | export REDUCTO_API_KEY="your_key" — takes precedence over saved config |
| Manual entry | The CLI prompts for manual key entry as a fallback |
The config file is stored at ~/.reducto/config.yaml with 0600 permissions.
Related Projects
| Project | Description |
|---|---|
| Reducto Python SDK | Full Python client for the Reducto API (pip install reductoai) |
| Reducto Node.js SDK | Node.js client for the Reducto API (npm install reductoai) |
| Reducto Go SDK | Go client for the Reducto API |
| Reducto Claude Code Plugins | Official Reducto plugins for Claude Code |
| Reducto Studio | No-code web interface for document processing |
Resources
- Reducto Documentation — API reference, guides, and tutorials
- API Quickstart — Get started with the Reducto API
- Security & Compliance — SOC 2 Type II, HIPAA, and data handling policies
- Reducto Website — Product overview and company information
- PyPI Package — Package registry listing
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reducto_cli-0.1.4.tar.gz.
File metadata
- Download URL: reducto_cli-0.1.4.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17ff9e614fe70461cc923cb495f7cb05c85745c3eb8d15520d6f50058a7a793a
|
|
| MD5 |
d20d5379a8165d5c93b9c3bf44743b54
|
|
| BLAKE2b-256 |
eae3c7ff147c6adbbc22bee29e26225115e63c318981be30cdaa50ef056aff1b
|
Provenance
The following attestation bundles were made for reducto_cli-0.1.4.tar.gz:
Publisher:
publish.yml on reductoai/cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
reducto_cli-0.1.4.tar.gz -
Subject digest:
17ff9e614fe70461cc923cb495f7cb05c85745c3eb8d15520d6f50058a7a793a - Sigstore transparency entry: 956032050
- Sigstore integration time:
-
Permalink:
reductoai/cli@c6f262f627a015403992518e5232bb4a3f837e4d -
Branch / Tag:
refs/heads/master - Owner: https://github.com/reductoai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c6f262f627a015403992518e5232bb4a3f837e4d -
Trigger Event:
push
-
Statement type:
File details
Details for the file reducto_cli-0.1.4-py3-none-any.whl.
File metadata
- Download URL: reducto_cli-0.1.4-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60e492aeca96fbc23e7af72339c744e9ca4c018b5eebc8f05ffac846cdd8c0c7
|
|
| MD5 |
98c23b7ce8d44376bca351d15641e8e3
|
|
| BLAKE2b-256 |
893511f27e62eee046f3c590dcb5f23be9679565d1eb7673ac77aa7f6bbeed37
|
Provenance
The following attestation bundles were made for reducto_cli-0.1.4-py3-none-any.whl:
Publisher:
publish.yml on reductoai/cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
reducto_cli-0.1.4-py3-none-any.whl -
Subject digest:
60e492aeca96fbc23e7af72339c744e9ca4c018b5eebc8f05ffac846cdd8c0c7 - Sigstore transparency entry: 956032053
- Sigstore integration time:
-
Permalink:
reductoai/cli@c6f262f627a015403992518e5232bb4a3f837e4d -
Branch / Tag:
refs/heads/master - Owner: https://github.com/reductoai
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c6f262f627a015403992518e5232bb4a3f837e4d -
Trigger Event:
push
-
Statement type: