Python package, scraping recipes from all over the internet
Project description
A simple web scraping tool for recipe sites.
pip install recipe-scrapers
then:
from recipe_scrapers import scrape_me
# give the url as a string, it can be url from any site listed below
scraper = scrape_me('https://www.allrecipes.com/recipe/158968/spinach-and-feta-turkey-burgers/')
# Q: What if the recipe site I want to extract information from is not listed below?
# A: You can give it a try with the wild_mode option! If there is Schema/Recipe available it will work just fine.
scraper = scrape_me('https://www.feastingathome.com/tomato-risotto/', wild_mode=True)
scraper.title()
scraper.total_time()
scraper.yields()
scraper.ingredients()
scraper.instructions()
scraper.image()
scraper.host()
scraper.links()
scraper.nutrients() # if available
Notes:
Starting from v13.0.0 the packaged stopped suppressing scraper exceptions by default. If you want the previous behaviour
import os
from recipe_scrapers import scrape_me
os.environ["RECIPE_SCRAPERS_SETTINGS"] = "recipe_scrapers.settings.v12_settings"
scraper = scrape_me(...) # etc.
scraper.links() returns a list of dictionaries containing all of the <a> tag attributes. The attribute names are the dictionary keys.
Scrapers available for:
Contribute
Part of the reason I want this open sourced is because if a site makes a design change, the scraper for it should be modified.
If you spot a design change (or something else) that makes the scraper unable to work for a given site - please fire an issue asap.
If you are programmer PRs with fixes are warmly welcomed and acknowledged with a virtual beer.
If you want a scraper for a new site added
Open an Issue providing us the site name, as well as a recipe link from it.
You are a developer and want to code the scraper on your own:
If Schema is available on the site - you can do this
Otherwise, scrape the HTML - like this
Generating a new scraper class:
python generate.py <ClassName> <URL>
ClassName: The name of the new scraper class.
URL: The URL of an example recipe from the target site. The content will be stored in test_data to be used with the test class.
For Devs / Contribute
Assuming you have python3 installed, navigate to the directory where you want this project to live in and drop these lines
git clone git@github.com:hhursev/recipe-scrapers.git &&
cd recipe-scrapers &&
python3 -m venv .venv &&
source .venv/bin/activate &&
pip install -r requirements-dev.txt &&
pre-commit install &&
python -m coverage run -m unittest &&
python -m coverage report
In case you want to run a single unittest for a newly developed scraper
python -m coverage run -m unittest tests.test_myscraper
FAQ
How do I know if a website has a Recipe Schema? Run in python shell:
from recipe_scrapers import scrape_me
scraper = scrape_me('<url of a recipe from the site>', wild_mode=True)
# if no error is raised - there's schema available:
scraper.title()
scraper.instructions() # etc.
Special thanks to:
All the contributors that helped improving the package. You are awesome!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for recipe_scrapers_ap_fork-13.3.6.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | c27b04d8d01b460754377bd6adca3e0a7915e7e73e28878c6066c8d24e8b8c51 |
|
MD5 | e21e970e50019dd5ad444b87425ed11b |
|
BLAKE2b-256 | 98da754b9d330b1a15b7018329eacefc0f57449e3b0f9ce2a4d045d39b7883ef |
Hashes for recipe_scrapers_ap_fork-13.3.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f15de81a7e30c96adda6a2d0d8a946ffbac58a5f8f6e31ec7e44303d1baebd58 |
|
MD5 | e780a93f3417fc25d40d00e10e527314 |
|
BLAKE2b-256 | 8d46a9bc337e1de07b3392ed26986d71b8d9852161e456b896ed575cc277467f |