Skip to main content

A Python package to parse structured information from recipe ingredient sentences

Project description

Ingredient Parser

The Ingredient Parser package is a Python package for parsing structured information out of recipe ingredient sentences.

1 large onion, finely chopped

becomes

{
    "quantity": 1,
    "unit": "large",
    "name": "onion",
    "comment": "finely chopped"
}

Documentation

Documentation on using the package and training the model can be found at https://ingredient-parser.readthedocs.io/en/latest/.

Quick Start

Install the package using pip

python -m pip install ingredient-parser-nlp

Import the ```parse_ingredient`` function and pass it an ingredient sentence.

>>> from ingredient_parser import parse_ingredient

>>> parse_ingredient("3 pounds pork shoulder, cut into 2-inch chunks")
{'sentence': '3 pounds pork shoulder, cut into 2-inch chunks',
 'quantity': '3',
 'unit': 'pound',
 'name': 'pork shoulder',
 'comment': ', cut into 2-inch chunks',
 'other': ''}

# Output confidence for each label
>>> parse_ingredient("3 pounds pork shoulder, cut into 2-inch chunks", confidence=True)
{'sentence': '3 pounds pork shoulder, cut into 2-inch chunks',
 'quantity': '3',
 'unit': 'pound',
 'name': 'pork shoulder',
 'comment': ', cut into 2-inch chunks',
 'other': '',
 'confidence': {'quantity': 0.9986,
  'unit': 0.9967,
  'name': 0.9535,
  'comment': 0.9967,
  'other': 0}}

The returned dictionary has the format

{
    "sentence": str,
    "quantity": str,
    "unit": str,
    "name": str,
    "comment": Union[List[str], str],
    "other": Union[List[str], str]
}

Model accuracy

The model provided in ingredient-parser/ directory has the following accuracy on a test data set of 25%:

Sentence-level results:
	Total: 9277
	Correct: 7689
	-> 82.88%

Word-level results:
	Total: 52931
	Correct: 50051
	-> 94.56%

Development

The development dependencies are in the requirements-dev.txt file.

Note that development includes training the model.

  • Black is used for code formatting.

  • isort is used for import sorting.

  • flake8 is used for linting. Note the line length standard (E501) is ignored.

  • pyrigt is used for type static analysis.

The documentation dependencies are in the requirement-doc.txt file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ingredient_parser_nlp-0.1.0a1.tar.gz (790.6 kB view hashes)

Uploaded Source

Built Distribution

ingredient_parser_nlp-0.1.0a1-py3-none-any.whl (787.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page