Skip to main content

USPTO 특허 심사과정 분석 CLI — 문서 다운로드 · XML 파싱 · MD 생성

Project description

uspto-oa-cli

A CLI tool that downloads USPTO patent prosecution documents via the ODP (Open Data Portal) API, parses the XML, and converts them into structured Markdown.

Supports a workflow where the generated MD file is passed to AI agents (Claude Code, Gemini CLI, etc.) for prosecution strategy analysis.

Requirements

Installation

# pip
pip install uspto-oa-cli

# uv (global install)
uv tool install uspto-oa-cli

# uv (add as project dependency)
uv add uspto-oa-cli

# local development
uv sync

API Key Setup

# Interactive setup (recommended) — saved to ~/.oa-cli.toml
# Prompts for: API key, HTTPS/HTTP proxy URL, CA bundle path
uspto-oa configure

# Show current configuration
uspto-oa configure --show

Or set via environment variable:

export USPTO_API_KEY=your_api_key_here

Proxy & SSL (Corporate Networks)

If you're behind a corporate proxy or need a custom CA bundle, set them during configure or edit ~/.oa-cli.toml directly:

[auth]
api_key = "YOUR_KEY"

[proxy]
https = "http://proxy.example.com:8080"
http  = "http://proxy.example.com:8080"

[ssl]
ca_bundle = "/path/to/corporate-ca.pem"

These settings are applied to all HTTP requests. If omitted, the standard requests library environment-variable fallback (HTTPS_PROXY, HTTP_PROXY, REQUESTS_CA_BUNDLE) applies.

Usage

# 0. Check document list before downloading
uspto-oa list 16330077

# 1. Download documents (saved to file/{app_num}/)
uspto-oa download 16330077

# 2. Parse XML → generate prosecution.md
uspto-oa extract 16330077
# Output: file/16330077/16330077_prosecution.md

# Extract in JSON format
uspto-oa extract 16330077 --format json

# Filter by date range and sort newest-first
uspto-oa extract 16330077 --from 2022-01-01 --to 2022-12-31 --sort desc

# 3. (Optional) OCR image-based PDFs → searchable PDFs
uspto-oa ocr 16330077

# 4. (Optional) Embed OCR text into prosecution.md for AI analysis
#    Run after step 3. Selectively include high-value doc codes to
#    avoid filling up the AI context window.
uspto-oa extract 16330077 --with-ocr --ocr-codes CTNF,CTFR,REM,EXIN,CTAV

# Download specific document codes only
uspto-oa download 16330077 --doc-codes CTNF,CTFR,NOA

# Force re-download (overwrite existing files)
uspto-oa download 16330077 --force

# Preview what would be downloaded / generated — no network requests, no files written
uspto-oa download 16330077 --dry-run
uspto-oa extract 16330077 --dry-run

# Verbose logging
uspto-oa -v download 16330077

# One-time API key override
uspto-oa download 16330077 --api-key YOUR_KEY

Command Options

uspto-oa list <application>

Option Description
--all Show all documents without prosecution-related filter
--format [table|json] Output format (default: table)
--api-key TEXT API key

uspto-oa download <application>

Option Description
--doc-codes CODES Comma-separated document codes (e.g. CTNF,CTFR,NOA). All prosecution docs if omitted
--output-dir DIR Save path (default: file/{app_num}/)
--force Re-download even if file already exists
--dry-run Preview what would be downloaded — no network requests, no files written
--api-key TEXT API key (overrides config file and environment variable)

uspto-oa extract <application>

Option Description
--format [md|json] Output format (default: md)
--output-dir DIR File directory (default: file/{app_num}/)
--with-ocr Embed OCR text from *_ocr.pdf files into prosecution.md (run ocr first)
--ocr-codes CODES Comma-separated doc codes to embed (default: CTNF,CTFR,NOA,NACT,EXIN,REM,CTAV + A*)
--from YYYY-MM-DD Include only documents on or after this date
--to YYYY-MM-DD Include only documents on or before this date
--sort [asc|desc] Sort timeline by date (default: asc)
--dry-run Render output but do not write the prosecution.md/.json file

Doc code guide for --ocr-codes — choosing the right codes prevents AI context overflow:

Code Description OCR value Default included
CTNF Non-Final Office Action High — core rejection grounds
CTFR Final Office Action High — core rejection grounds
NOA / NACT Notice of Allowance Medium — allowance reasons
EXIN Examiner Interview Summary High — often PDF-only in modern apps
REM Remarks (applicant arguments) High — often PDF-only
CTAV Advisory Action Medium — examiner's response to after-final amendment
A* All Amendment variants High — when XML parsing fails
ABN Abandonment Low
RCE / RCEX Request for Continued Examination Low
SRNT / SRFW Search Report Low — very long, little analysis value
892 / 1449 / IDS Prior Art / IDS Low — very long, reference lists

uspto-oa ocr <application>

USPTO PDF documents are full-page image scans — standard text extraction fails. This command runs OCR on every PDF in the application directory and produces searchable PDFs alongside the originals.

Note: ocrmypdf is bundled as a default dependency, but requires system packages to function: Tesseract OCR and Ghostscript. Install them once with your OS package manager before running this command.

macOS

brew install tesseract ghostscript

Windows (choose one)

# winget (built-in, Windows 10/11)
winget install -e --id UB-Mannheim.TesseractOCR
winget install -e --id ArtifexSoftware.Ghostscript

# Chocolatey
choco install tesseract ghostscript

Linux (Debian/Ubuntu)

sudo apt install tesseract-ocr ghostscript

After installation, reopen your terminal so the new commands are on PATH.

Option Description
--force Re-OCR even if output already exists
--in-place Overwrite original PDFs instead of creating *_ocr.pdf copies
--no-deskew Skip deskew correction (faster)
--output-dir DIR File directory (default: file/{app_num}/)
# Run OCR (creates {original}_ocr.pdf next to each PDF)
uspto-oa ocr 16330077

# Overwrite originals in place
uspto-oa ocr 16330077 --in-place

Workflow

uspto-oa list {app_num}               # Check document list before downloading
    └─ Browse prosecution document codes and formats

uspto-oa download {app_num}
    └─ Save XML / PDF to file/{app_num}/

uspto-oa extract {app_num}            # XML-only (fast, default)
    └─ Generate file/{app_num}/{app_num}_prosecution.md
         └─ AI agent (Claude Code / Gemini CLI)
              └─ Prosecution strategy analysis, summaries, Q&A

# ── Optional: include PDF-only documents in prosecution.md ──────────────────

uspto-oa ocr {app_num}                # Step A: OCR all PDFs → *_ocr.pdf
    └─ Generate {original}_ocr.pdf next to each PDF

uspto-oa extract {app_num} \          # Step B: embed selected OCR text
    --with-ocr \
    --ocr-codes CTNF,CTFR,REM,EXIN,CTAV
    └─ prosecution.md now includes full text of selected PDF documents
         └─ AI agent reads one file, gets complete prosecution history

Collected Document Codes

Code Description
CTNF Non-Final Office Action
CTFR Final Office Action
NOA / NACT Notice of Allowance
REM Remarks
ABN Abandonment
SRNT / SRFW Search Report
EXIN Examiner Interview
RCE / RCEX Request for Continued Examination
CTAV Advisory Action
892 / 1449 / IDS Prior Art / IDS
A* All Amendment variants

Generated File Structure

file/{app_num}/{app_num}_prosecution.md:

Section Content
Timeline All documents sorted by date (XML/PDF format shown)
Office Action Details Full rejection grounds from CTNF/CTFR
Amendment Details Amended claims (CLM) + Remarks (REM)
Examiner Interview Details Full EXIN text
Notice of Allowance Details Allowed claims + Examiner's Statement
PDF-only Documents Image PDF list (for direct AI agent delivery)

Exit Codes

list and download translate USPTO API failures into distinct exit codes so that AI agents (or shell scripts) can branch on the result without parsing stderr text:

Code Meaning Suggested agent action
0 Success Proceed
1 General / network error Log and stop
2 Usage / validation error Fix input (application number, options)
3 Authentication failure (401/403, missing API key) Check API key with uspto-oa configure
4 Not found (404) Verify application number, skip
5 Rate limited (429) Wait and retry
6 Server error (5xx) Back off and retry

PyPI Release

uv build
uv run twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uspto_oa_cli-0.1.10.tar.gz (95.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uspto_oa_cli-0.1.10-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file uspto_oa_cli-0.1.10.tar.gz.

File metadata

  • Download URL: uspto_oa_cli-0.1.10.tar.gz
  • Upload date:
  • Size: 95.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for uspto_oa_cli-0.1.10.tar.gz
Algorithm Hash digest
SHA256 4a7c382a3f107a2ab51454cb30be7dbd519e5426b815d1d7f95079e64d894310
MD5 c4d414319add2a0579cae56efea2b9a7
BLAKE2b-256 bf51adc37cc17a8450031d40c2069f7f091e3429e160d7059d6b12c7d6abda18

See more details on using hashes here.

File details

Details for the file uspto_oa_cli-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: uspto_oa_cli-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 26.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for uspto_oa_cli-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 2d8e77ecb9eae46f5324d3e2117dea9fcb00aa64f69fdda66bd550a06c0def02
MD5 67f38ac61157c2d693cc2bd43cc7081a
BLAKE2b-256 28d83ac92a52fa26cc3fc506e624502e53e34d17f1c5b1b83067619d11af7630

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page