Skip to main content

A Python package to parse structured information from recipe ingredient sentences

Project description

Ingredient Parser

The Ingredient Parser package is a Python package for parsing structured information out of recipe ingredient sentences.

Documentation

Documentation on using the package and training the model can be found at https://ingredient-parser.readthedocs.io/.

Quick Start

Install the package using pip

$ python -m pip install ingredient-parser-nlp

Import the parse_ingredient function and pass it an ingredient sentence.

>>> from ingredient_parser import parse_ingredient
>>> parse_ingredient("3 pounds pork shoulder, cut into 2-inch chunks")
ParsedIngredient(
    name=[IngredientText(text='pork shoulder', confidence=0.996867, starting_index=2)],
    size=None,
    amount=[IngredientAmount(quantity=Fraction(3, 1),
                             quantity_max=Fraction(3, 1),
                             unit=<Unit('pound')>,
                             text='3 pounds',
                             confidence=0.999982,
                             starting_index=0,
                             unit_system=<UnitSystem.US_CUSTOMARY: 'us_customary'>,
                             APPROXIMATE=False,
                             SINGULAR=False,
                             RANGE=False,
                             MULTIPLIER=False,
                             PREPARED_INGREDIENT=False)],
	preparation=IngredientText(text='cut into 2 inch chunks',
                               confidence=0.999946,
                               starting_index=5),
	comment=None,
	purpose=None,
	foundation_foods=[],
	sentence='3 pounds pork shoulder, cut into 2-inch chunks'
)

Refer to the documentation here for the optional parameters that can be used with parse_ingredient .

Model

The core of the library is a sequence labelling model that is used to label each token in the sentence with the part of the sentence it belongs to. A data set of over 81,000 example sentences is used to train and evaluate the model. See the Explanation section of the documentation for more details.

The model has the following accuracy on a test data set of 20% of the total data used:

╒══════════════════════════╤══════════════════════════╕
│ Sentence-level results   │ Word-level results       │
╞══════════════════════════╪══════════════════════════╡
│ Accuracy: 95.62%         │ Accuracy: 98.26%         │
│                          │ Precision (micro) 98.25% │
│                          │ Recall (micro) 98.26%    │
│                          │ F1 score (micro) 98.25%  │
╘══════════════════════════╧══════════════════════════╛

Development

Basic

Train and fine-tune new ingredient datasets to expand beyond the existing trained model provided in the library. The development dependencies are in the requirements-dev.txt file. Details on the training process can be found in the Explanation documentation.

Web App

The ingredient parser library provides a convenient web interface that you can run locally to access most of the library's functionality, including using the parser, browsing the database, labelling entries, and training the model(s). View the specific README in webtools for a detailed overview.

Parser Labeller Trainer
Screen shot of web parser Screen shot of web labeller Screen shot of web trainer

Documentation

The dependencies for building the documentation are in the requirements-doc.txt file.

Tests

The ingredient parser library has extensive test coverage. The pytest framework is used for testing, and coverage.py is used to measure test coverage.

# Run the test suite
$ pytest

# Evaluate test coverage
$ coverage run -m pytest
# Generate coverage report
$ coverage html

Contribution

Please target the develop branch for pull requests. The main branch is used for stable releases and hotfixes only.

Before committing anything, install pre-commit and run the following to install the hooks:

$ pre-commit install

Pre-commit hooks cover both the main python library code and the web app (webtools) code.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ingredient_parser_nlp-2.7.0.tar.gz (4.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ingredient_parser_nlp-2.7.0-py3-none-any.whl (4.0 MB view details)

Uploaded Python 3

File details

Details for the file ingredient_parser_nlp-2.7.0.tar.gz.

File metadata

  • Download URL: ingredient_parser_nlp-2.7.0.tar.gz
  • Upload date:
  • Size: 4.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for ingredient_parser_nlp-2.7.0.tar.gz
Algorithm Hash digest
SHA256 1ea3b8f95aae7e1b82542aa91c482fb2edff713e06bf4045dac78d3f7513e030
MD5 975612dd5411079d7c3c5ca7aa310ce6
BLAKE2b-256 a44ca1a7a8d724b2e12da6e32ca56f40142cc367b15bb9c9342743e9c701cfec

See more details on using hashes here.

File details

Details for the file ingredient_parser_nlp-2.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ingredient_parser_nlp-2.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cdb287a1e43ab7429ea96c98638c094fd99fcd8d124dd2d485debf2b328d7b5b
MD5 333b0fb9b2288ae84bac8cf7492811c2
BLAKE2b-256 8c3f1f9bc3da4c266b43c7b50dff0ac1885700fd1556cb136d2cc972be02d0b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page