Extract recipes from websites, images, and documents.
Project description
Recipe Clipper
Extract recipes from websites, images, and documents with ease. Recipe Clipper supports multiple input sources and uses both web scraping and Claude's vision capabilities to extract structured recipe data.
Features
- 🌐 Web Scraping: Extract recipes from 100+ websites using recipe-scrapers
- 📸 Image OCR: Extract recipes from cookbook photos, recipe cards, or screenshots using Claude's vision API
- 📄 Document Parsing: Extract recipes from PDFs, Word documents, text files, and markdown
- 🤖 LLM Fallback: Automatically falls back to Claude for unsupported websites
- 🎨 Multiple Output Formats: Export as text, JSON, or markdown
- 🔧 CLI & Library: Use as a command-line tool or import as a Python library
- ⚡ Type-Safe: Full type hints with Pydantic models
- 🔒 Immutable: Data models are frozen for safety
Installation
Basic Installation
pip install recipe-clipper
This includes:
- Web scraping for 100+ recipe websites
- CLI tool
- All core functionality
With Claude Support
For image/document parsing and LLM fallback:
pip install recipe-clipper[llm]
Quick Start
CLI Usage
Extract from a website
# Basic usage
recipe-clipper clip-webpage https://www.allrecipes.com/recipe/10813/best-chocolate-chip-cookies/
# Save as JSON
recipe-clipper clip-webpage https://example.com/recipe --format json --output recipe.json
# Save as markdown
recipe-clipper clip-webpage https://example.com/recipe --format markdown --output recipe.md
Extract from an image
# Requires ANTHROPIC_API_KEY environment variable
export ANTHROPIC_API_KEY=your-api-key
recipe-clipper clip-image cookbook-photo.jpg
# Save as JSON
recipe-clipper clip-image recipe-card.png --format json --output recipe.json
Extract from a document
# Supports PDF, DOCX, TXT, MD
recipe-clipper clip-document recipe.pdf
# With custom model
recipe-clipper clip-document cookbook.docx --model claude-opus-4 --format markdown
Library Usage
Extract from a website
from recipe_clipper import clip_recipe
# Without LLM fallback (uses recipe-scrapers only)
recipe = clip_recipe(
url="https://www.allrecipes.com/recipe/10813/best-chocolate-chip-cookies/",
api_key=None,
use_llm_fallback=False
)
print(recipe.title)
for ingredient in recipe.ingredients:
print(f"- {ingredient.name}")
With LLM fallback
import os
from recipe_clipper import clip_recipe
api_key = os.getenv("ANTHROPIC_API_KEY")
# Automatically falls back to Claude if recipe-scrapers doesn't support the site
recipe = clip_recipe(
url="https://unsupported-site.com/recipe",
api_key=api_key,
use_llm_fallback=True
)
Extract from an image
import os
from recipe_clipper.parsers.llm_parser import parse_recipe_from_image
api_key = os.getenv("ANTHROPIC_API_KEY")
recipe = parse_recipe_from_image(
image_path="cookbook-photo.jpg",
api_key=api_key,
model="claude-sonnet-4-5"
)
print(recipe.title)
print(f"Servings: {recipe.metadata.servings}")
Extract from a document
import os
from recipe_clipper.parsers.llm_parser import parse_recipe_from_document
api_key = os.getenv("ANTHROPIC_API_KEY")
# Supports .pdf, .docx, .txt, .md
recipe = parse_recipe_from_document(
document_path="recipe.pdf",
api_key=api_key
)
Format output
from recipe_clipper import clip_recipe
from recipe_clipper.formatters import (
format_recipe_text,
format_recipe_json,
format_recipe_markdown
)
recipe = clip_recipe("https://example.com/recipe", use_llm_fallback=False)
# Plain text
print(format_recipe_text(recipe))
# JSON
json_str = format_recipe_json(recipe)
# Markdown
markdown_str = format_recipe_markdown(recipe)
Configuration
API Keys
For Claude features (image/document parsing, website fallback), set your API key:
export ANTHROPIC_API_KEY=your-api-key-here
Or create a .env file:
ANTHROPIC_API_KEY=your-api-key-here
Supported Models
claude-sonnet-4-5(default, recommended)claude-sonnet-4claude-opus-4claude-3-5-sonnet-20241022claude-3-5-sonnet-20240620
Recipe Data Model
Extracted recipes use a structured Pydantic model:
class Recipe:
title: str
ingredients: list[Ingredient]
instructions: list[str]
source_url: Optional[AnyUrl]
image: Optional[HttpUrl]
metadata: Optional[RecipeMetadata]
class Ingredient:
name: str
amount: Optional[str]
unit: Optional[str]
preparation: Optional[str]
display_text: Optional[str]
class RecipeMetadata:
author: Optional[str]
servings: Optional[str]
prep_time: Optional[int] # minutes
cook_time: Optional[int] # minutes
total_time: Optional[int] # minutes
categories: Optional[list[str]]
Supported Input Sources
1. Websites (100+ sites)
Uses recipe-scrapers which supports:
- AllRecipes
- Food Network
- Serious Eats
- NYT Cooking
- And 100+ more sites
For unsupported sites, enable LLM fallback.
2. Images
Extracts recipes from:
- Cookbook photos
- Handwritten recipe cards
- Screenshots
- Scanned documents
Supported formats: .jpg, .jpeg, .png, .gif, .webp
3. Documents
Extracts recipes from:
- PDFs (recipe PDFs, cookbook PDFs)
- Word documents (
.docx) - Text files (
.txt) - Markdown files (
.md)
Development
Run tests
# Unit tests only
pytest
# Include integration tests (requires ANTHROPIC_API_KEY)
pytest -m integration
Run linting
ruff check src/ tests/
ruff format src/ tests/
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see LICENSE file for details.
Credits
- Built with recipe-scrapers
- LLM parsing powered by Anthropic Claude
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file recipe_clipper-0.1.0a0.tar.gz.
File metadata
- Download URL: recipe_clipper-0.1.0a0.tar.gz
- Upload date:
- Size: 17.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cb9bf6665ca9130380997e99a43c3be122c30f1587cd8d24844b18e90212939
|
|
| MD5 |
472cdd8cfb58242d0f52680bfe957627
|
|
| BLAKE2b-256 |
f9e3b2654c1e4c0b7e8da90f55d4ebb0248ab4a3cf4255ffd9e45b4075d7974e
|
Provenance
The following attestation bundles were made for recipe_clipper-0.1.0a0.tar.gz:
Publisher:
publish.yml on zduey/kitchenmate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
recipe_clipper-0.1.0a0.tar.gz -
Subject digest:
5cb9bf6665ca9130380997e99a43c3be122c30f1587cd8d24844b18e90212939 - Sigstore transparency entry: 808230557
- Sigstore integration time:
-
Permalink:
zduey/kitchenmate@e28fd243feb4f3efb5129b11c80a8bbbfa9b0346 -
Branch / Tag:
refs/tags/v0.1.0a0 - Owner: https://github.com/zduey
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e28fd243feb4f3efb5129b11c80a8bbbfa9b0346 -
Trigger Event:
push
-
Statement type:
File details
Details for the file recipe_clipper-0.1.0a0-py3-none-any.whl.
File metadata
- Download URL: recipe_clipper-0.1.0a0-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4895ea7fe51ee117bac97bad72dda533de6ae77269ba29b78b79f18c6fc1fd13
|
|
| MD5 |
747666a74ce272f38d46be03d3da5840
|
|
| BLAKE2b-256 |
6b7a5cc6a2ae6a292ca0029b48fe3c9310ea20fa65889569dfc45300b72a2512
|
Provenance
The following attestation bundles were made for recipe_clipper-0.1.0a0-py3-none-any.whl:
Publisher:
publish.yml on zduey/kitchenmate
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
recipe_clipper-0.1.0a0-py3-none-any.whl -
Subject digest:
4895ea7fe51ee117bac97bad72dda533de6ae77269ba29b78b79f18c6fc1fd13 - Sigstore transparency entry: 808230585
- Sigstore integration time:
-
Permalink:
zduey/kitchenmate@e28fd243feb4f3efb5129b11c80a8bbbfa9b0346 -
Branch / Tag:
refs/tags/v0.1.0a0 - Owner: https://github.com/zduey
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e28fd243feb4f3efb5129b11c80a8bbbfa9b0346 -
Trigger Event:
push
-
Statement type: