A small package to translate an Italian recipe and its units into English and imperial units using Google Translate
Project description
What is r2api?
recipe2api is a Python package aimed at converting recipes on blogs without an external API into a Python dictionary/JSON object. As of now, it can only parse websites (reliably) for which there are converters.
What does r2api do?
Feed a URL (depending on which are available; each one has to be manually coded) into one of the Converters. By default (but not obligatorily), units are changed from metric to imperial. An optional module will translate it into English using Google Cloud Translate. Read the details below.
How do I install it?
pip install r2api
Other dependencies
This package needs several packages. bs4, Beautiful Soup and Requests are included in requirements.txt. Because google-cloud-translate is much larger (and is only used for one part of the functionality that can be replaced via a separate API key), it isn't included. But it and all of the dependencies can be installed with the following command (note: you need gcc to run it properly, which is included with almost all operating systems): pip install google-cloud-translate
How to use it generally:
import r2api
r_1 = r2api.GZConverter("https://ricette.giallozafferano.it/Zuppa-di-ceci.html")
r_2 = r2api.FCConverter("https://www.fattoincasadabenedetta.it/ricetta/pasta-al-forno-con-polpette-di-ricotta/")
translated_recipe_1 = r2api.translate_data(r_1.recipe)
translated_recipe_2 = r2api.translate_data(r_2.recipe)
Optionally: more explicitly or to decrease load times:
import r2api.converter.giallo_zafferano as gz
import r2api.converter.fatti_in_casa as fic
import r2api.translate.apply_translation as apply
r_1 = gz.GZConverter("https://ricette.giallozafferano.it/Zuppa-di-ceci.html")
r_2 = fic.FCConverter("https://www.fattoincasadabenedetta.it/ricetta/pasta-al-forno-con-polpette-di-ricotta/")
translated_recipe_1 = apply.translate_data(r_1.recipe)
translated_recipe_2 = apply.translate_data(r_2.recipe)
How does it work?
The Converter classes uses BeautifulSoup and RegEx to parse an appropriate website into a dictionary of the following format: recipe['name']: string recipe['image']: string recipe['ingredients']: list - [name: string, quantity: float | int | string*, unit: string] recipe['preparation']: list - [step: string]
Note two things: Converters have two optional parameters other than the URL, both keyword-only arguments:
- convert_units: a boolean set to True by default. If set to false, the units will not be converted from metric to imperial units.
- read_from_file: a boolean set to False by default. If set to True, the path is assumed to be a relative path to a file containing the appropriate bs4 soup (of the same style as created when the write_soup_to method is invoked)
- Note also that the Converter class has limited functionality as dictionaries, being able to get and set items on self.recipe if you want to save yourself a few keystrokes
- With the addition of the MZConverter in version 0.1.6, the ingredient expectations changed slightly. Before, they always came in one of three formats:
- Savoiardi (name: string), 10.56 (quantity: float), oz (unit: string)
- Uova (name: string), 3 (quantity: int), n/a (unit: string)
- Cacao amaro in polvere per la superficie (name: string), to taste (quantity: string), n/a (unit: string) These recipes have not yet been translate but have had the units already converted, hence their weird combination of Italian and English. As you can see, the only circumstance for the quantity to be a string is if it was a special word. There are three of them witnessed so far: q.b., q.s., and 'a piacere' (all roughly meaning and translated to 'to taste'). And if n/a came up, it was always the unit. With the addition of the MZConverter, the ingredients, in addition to how they're displayed above, can also be of the format:
- Zucchero a velo (name: string), n/a (quantity: string), to taste (unit: string) Here, the quantity has become n/a and the unit was q.b. - This is because of a kink in which the MZConverter is made. It could easily be corrected; however, the architecture would need to be changed, and I personally like the flexibility that has been granted. But, for anyone using the API, either keep in mind that the MZConverter is unique, or that 'to taste' and 'n/a' can show up in both units and quantities.
The converter class has five class methods:
write_soup_to(path: string): void
The method writes the bs4.prettify() object to a file write_recipe_to(path: string, *, indent: integer = 4): void The method writes the recipe as a JSON object with the indicated indentation elaborate(): void The method returns self.recipe as a string in a slightly nicer format
Note: for the following two methods, the BeautifulSoup soup should be parsed with the lxml parser for it to be interpreted correctly by the following methods. The html.parser can create errors and inconsistencies.
For example: with open(file_path, 'r') as f: soup = bs4.BeautifulSoup(f, 'lxml')
get_ingredients(soup: bs4.BeautifulSoup, convert_units: bool)
The method will return a list of the following format: [ingredient name: string, ingredient quantity: float, ingredient unit: string The units and quantities will have been converted from metric to imperial units if convert_units is True
get_preparation(soup: bs4.BeautifulSoup, convert_units: bool)
The method will return a list of the preparation steps The units and quantities will have been converted from metric to imperial units if convert_units is True
There is are several utility methods accessible
either simply as: import r2api converted_units = r2api.convert_units_prep(instruction) or explicitly (and to reduce load times): import r2api.utilities.unit_conversion as uc converted_units = uc.convert_units_prep(instruction)
The two most important methods are for converting units. The first is for the ingredients:
convert_units_ing(quantity: string, unit: string): float, string
This is the process called from within get_ingredients_g_z to convert the quantities and units It will return the quantity and unit that have been changed
convert_units_prep(instruction: string): string
It will return the string with every occurrence of a metric quantity and unit converted into imperial equivalents. Identification done with RegEx
These last two methods are called from within the converters if convert_units is True
Translating the recipe can be accomplished in different ways, but the provided method uses Google Cloud Translations.
The method to call: translate_data(recipe: dict, source_language: string = 'it', target_language: string = 'en', client: bool = False, custom_dict: dict = None)
- The recipe expected will be of the format provided from the Converter class.
- Source and target languages are the two letter country codes as documented on the Google Cloud docs at: https://cloud.google.com/translate/docs/languages
- client is used to indicate whether you are using an API Key (the default) saved to the environment variable API_KEY or have the credentials saved with the path specified according to an environment variable called GOOGLE_APPLICATION_CREDENTIALS. Further information can be found at https://cloud.google.com/translate/docs/setup and https://cloud.google.com/docs/authentication/api-keys
- A custom dictionary can be added to as a last-minute way to substitute certain words that are translated incorrectly for the context(i.e. spoons instead of spoonfuls)
Known issues
- Occasionally words will not be translated correctly.
- Converters can sometimes insert extra spaces and tabs if reading from files
Ideas for improvement
- Rounding to sensible quantities, i.e. 1.5 lbs instead of 1.34 lbs
- Break apply_translation up into smaller functions (would also allow for better testing)
- Add functionality for other translators besides Google Cloud or write my own NLP model
- Refactoring the existing converters into smaller functions
- Add more Converters
Changelog
0.1.0: First release
0.1.3:
- Included requirements.txt and MANIFEST.in for test files
- Fixed an error in the GZConverter that failed to detect ingredients with both a vulgar fraction and a unit
- Increased subgroups of GZConverter RegEx parsing ingredients from 3 to 5 to allow capture of notes with units inside. In the case of unit conversion being enabled, these are converted from metric to imperial too.
- Created a redundant backup for empty units in the GZConverter
- Updated tests to include examples featuring each of the above ingredients
0.1.4:
- Added a BaseConverter class to keep the code more consistent and DRY
- Added a convert_units_name method to utilities.unit_conversion for the odd case in recipes that units are part of a note and therefore put in the name of an ingredient
0.1.5:
- Made BaseConverter and some of its methods abstract
- Giallo Zafferano changed how the images were found on its recipes, now using a link tag instead of a source tag - the GZ Converted has been adjusted accordingly.
- A new converter! For one of the GZ Blogs, Molliche Di Zucchero. Tests have not been written, nor has it been seen for which other blogs it works.
- License updated for 2021
0.1.6:
- Changed convert_units_ing in the unit_conversion.py utility file from throwing errors to just returning the unconverted data if it could not be coerced correctly.
- Overhauled convert_units_prep in abovementioned file so it uses only two regex, and spaces are correctly accounted for.
- Changed a nested for loop into a nested list comprehension. Progress!
0.1.6b:
- I made ONE error, calling _get_ingredient_final instead of self._get_ingredient_final in the MZConverter inside of abovementioned nested list comprehensions. This is why I need tests! Gee willickers! They're coming when I have the time.
0.1.7:
- I wanted to just write a test suite for the MZConverter. Guess what? convert_units_prep was not passing its tests. After way too much work on regular expressions, I managed to fix it by simplifying it further to one regular expression (there were four in 0.1.5)
- Speaking of which, I added new tests to all classes to make sure they can parse a few recipes from the correct websites without errors. It uses the requests package, so an internet connection is required for the tests to run. This is to make sure I don't push any code that spontaneously combusts (hopefully).
- And yes, I did make a test suite for the MZConverter.
0.1.8:
- Added a new converter: for the Allacciate il Grembiule blog on the Giallo Zafferano site along with a test suite.
- Added a small alternate case for the convert_units_prep function in the unit_conversion util that checks for odd cases with degrees not being caught correctly because of the jankiness of getting other strangeness to fit.
0.1.9:
- Added a new converter (RMConverter): for Le Ricette di Max blog on the Giallo Zafferano site (tests forthcoming). It was by far the most challenging and frustrating I've done so far because it didn't have almost any of the conventions of modern websites.
- Added a new unit for conversion: decileters.
0.1.10:
- Added unit tests for the RMConverter
- Fixed a few typos and copy/pasted things
0.2.0:
- Set up travis for ci/cd
- Fixed tests up a bit, such as adding tests for image identification to all converters. Also the Ricette di Max tests were hilariously set to just give the wrong output so the tests would pass. I think I'd become really frustrated when I was writing the tests. I managed to fix them up really easily by just using some regex.
- Dramatically reduced the size of the HTML files for the tests. Almost all of the tests were 100kb+ of bloatware. One was more than 600kb! Who needs that much for ONE RECIPE on their blog? I was worried about the approximately 200kb for my whole blog. Also I added tests for iamge identification for all the converters.
- Attached a third condition for GZConverter to look for the recipe image.
0.2.1:
- Add type hinting for all functions/methods and better docstrings for the utility methods
- Add Github actions for CI/CD instead of travis
Why?
I made this originally as several modules I would find useful for myself because I am often translating Italian recipes into English and changing the metric quantities in the recipe into imperial units. I saw it as an opportunity to release my first Python package. I tried to document and comment my code as best possible, but this is among my first projects that I have made completely on my own from the ground up. Feel free to contact me for any reason or put the issue on github/pull request/etc.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file r2api-0.2.1.tar.gz
.
File metadata
- Download URL: r2api-0.2.1.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57b9fbb2962face7948182e0482648dd8bad29691b1b61c49bd7bbf3ee1dd02c |
|
MD5 | 60615e791e43e0f737c7a43163c8b17d |
|
BLAKE2b-256 | 2c25b0217eea0260df82e3d64b554fc122918233ad39589d107bb6c0c23f5ea0 |
File details
Details for the file r2api-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: r2api-0.2.1-py3-none-any.whl
- Upload date:
- Size: 31.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86c01cafdda0663cd1f85df996688189b8b5ae82604f9b8963980c0803a53eff |
|
MD5 | e4a0e759313e71fcfe955de1f79e64f0 |
|
BLAKE2b-256 | 6f4c00af949ecbe36a3d205ae70cdb8a411b2c8e1593a84a213c22261a0b0efd |