Skip to main content

Extract recipes from websites, images, and documents.

Project description

Recipe Clipper

Python Version codecov License: MIT PyPI version

Extract recipes from websites, images, and documents with ease. Recipe Clipper supports multiple input sources and uses both web scraping and Claude's vision capabilities to extract structured recipe data.

Features

  • 🌐 Web Scraping: Extract recipes from 100+ websites using recipe-scrapers
  • 📸 Image OCR: Extract recipes from cookbook photos, recipe cards, or screenshots using Claude's vision API
  • 📄 Document Parsing: Extract recipes from PDFs, Word documents, text files, and markdown
  • 🤖 LLM Fallback: Automatically falls back to Claude for unsupported websites
  • 🎨 Multiple Output Formats: Export as text, JSON, or markdown
  • 🔧 CLI & Library: Use as a command-line tool or import as a Python library
  • Type-Safe: Full type hints with Pydantic models
  • 🔒 Immutable: Data models are frozen for safety

Installation

Basic Installation

pip install recipe-clipper

This includes:

  • Web scraping for 100+ recipe websites
  • CLI tool
  • All core functionality

With Claude Support

For image/document parsing and LLM fallback:

pip install recipe-clipper[llm]

Quick Start

CLI Usage

Extract from a website

# Basic usage
recipe-clipper clip-webpage https://www.allrecipes.com/recipe/10813/best-chocolate-chip-cookies/

# Save as JSON
recipe-clipper clip-webpage https://example.com/recipe --format json --output recipe.json

# Save as markdown
recipe-clipper clip-webpage https://example.com/recipe --format markdown --output recipe.md

Extract from an image

# Requires ANTHROPIC_API_KEY environment variable
export ANTHROPIC_API_KEY=your-api-key

recipe-clipper clip-image cookbook-photo.jpg

# Save as JSON
recipe-clipper clip-image recipe-card.png --format json --output recipe.json

Extract from a document

# Supports PDF, DOCX, TXT, MD
recipe-clipper clip-document recipe.pdf

# With custom model
recipe-clipper clip-document cookbook.docx --model claude-opus-4 --format markdown

Library Usage

Extract from a website

from recipe_clipper import clip_recipe

# Without LLM fallback (uses recipe-scrapers only)
recipe = clip_recipe(
    url="https://www.allrecipes.com/recipe/10813/best-chocolate-chip-cookies/",
    api_key=None,
    use_llm_fallback=False
)

print(recipe.title)
for ingredient in recipe.ingredients:
    print(f"- {ingredient.name}")

With LLM fallback

import os
from recipe_clipper import clip_recipe

api_key = os.getenv("ANTHROPIC_API_KEY")

# Automatically falls back to Claude if recipe-scrapers doesn't support the site
recipe = clip_recipe(
    url="https://unsupported-site.com/recipe",
    api_key=api_key,
    use_llm_fallback=True
)

Extract from an image

import os
from recipe_clipper.parsers.llm_parser import parse_recipe_from_image

api_key = os.getenv("ANTHROPIC_API_KEY")

recipe = parse_recipe_from_image(
    image_path="cookbook-photo.jpg",
    api_key=api_key,
    model="claude-sonnet-4-5"
)

print(recipe.title)
print(f"Servings: {recipe.metadata.servings}")

Extract from a document

import os
from recipe_clipper.parsers.llm_parser import parse_recipe_from_document

api_key = os.getenv("ANTHROPIC_API_KEY")

# Supports .pdf, .docx, .txt, .md
recipe = parse_recipe_from_document(
    document_path="recipe.pdf",
    api_key=api_key
)

Format output

from recipe_clipper import clip_recipe
from recipe_clipper.formatters import (
    format_recipe_text,
    format_recipe_json,
    format_recipe_markdown
)

recipe = clip_recipe("https://example.com/recipe", use_llm_fallback=False)

# Plain text
print(format_recipe_text(recipe))

# JSON
json_str = format_recipe_json(recipe)

# Markdown
markdown_str = format_recipe_markdown(recipe)

Configuration

API Keys

For Claude features (image/document parsing, website fallback), set your API key:

export ANTHROPIC_API_KEY=your-api-key-here

Or create a .env file:

ANTHROPIC_API_KEY=your-api-key-here

Supported Models

  • claude-sonnet-4-5 (default, recommended)
  • claude-sonnet-4
  • claude-opus-4
  • claude-3-5-sonnet-20241022
  • claude-3-5-sonnet-20240620

Recipe Data Model

Extracted recipes use a structured Pydantic model:

class Recipe:
    title: str
    ingredients: list[Ingredient]
    instructions: list[str]
    source_url: Optional[AnyUrl]
    image: Optional[HttpUrl]
    metadata: Optional[RecipeMetadata]

class Ingredient:
    name: str
    amount: Optional[str]
    unit: Optional[str]
    preparation: Optional[str]
    display_text: Optional[str]

class RecipeMetadata:
    author: Optional[str]
    servings: Optional[str]
    prep_time: Optional[int]  # minutes
    cook_time: Optional[int]  # minutes
    total_time: Optional[int]  # minutes
    categories: Optional[list[str]]

Supported Input Sources

1. Websites (100+ sites)

Uses recipe-scrapers which supports:

  • AllRecipes
  • Food Network
  • Serious Eats
  • NYT Cooking
  • And 100+ more sites

For unsupported sites, enable LLM fallback.

2. Images

Extracts recipes from:

  • Cookbook photos
  • Handwritten recipe cards
  • Screenshots
  • Scanned documents

Supported formats: .jpg, .jpeg, .png, .gif, .webp

3. Documents

Extracts recipes from:

  • PDFs (recipe PDFs, cookbook PDFs)
  • Word documents (.docx)
  • Text files (.txt)
  • Markdown files (.md)

Development

Run tests

# Unit tests only
pytest

# Include integration tests (requires ANTHROPIC_API_KEY)
pytest -m integration

Run linting

ruff check src/ tests/
ruff format src/ tests/

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recipe_clipper-0.1.0a0.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

recipe_clipper-0.1.0a0-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file recipe_clipper-0.1.0a0.tar.gz.

File metadata

  • Download URL: recipe_clipper-0.1.0a0.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for recipe_clipper-0.1.0a0.tar.gz
Algorithm Hash digest
SHA256 5cb9bf6665ca9130380997e99a43c3be122c30f1587cd8d24844b18e90212939
MD5 472cdd8cfb58242d0f52680bfe957627
BLAKE2b-256 f9e3b2654c1e4c0b7e8da90f55d4ebb0248ab4a3cf4255ffd9e45b4075d7974e

See more details on using hashes here.

Provenance

The following attestation bundles were made for recipe_clipper-0.1.0a0.tar.gz:

Publisher: publish.yml on zduey/kitchenmate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file recipe_clipper-0.1.0a0-py3-none-any.whl.

File metadata

File hashes

Hashes for recipe_clipper-0.1.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 4895ea7fe51ee117bac97bad72dda533de6ae77269ba29b78b79f18c6fc1fd13
MD5 747666a74ce272f38d46be03d3da5840
BLAKE2b-256 6b7a5cc6a2ae6a292ca0029b48fe3c9310ea20fa65889569dfc45300b72a2512

See more details on using hashes here.

Provenance

The following attestation bundles were made for recipe_clipper-0.1.0a0-py3-none-any.whl:

Publisher: publish.yml on zduey/kitchenmate

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page