Skip to main content

Text comprehension library for Python

Project description

pyfathom

Text comprehension library for python

Blog

Follow the development of this project at http://jeremyorme.com

Example

Given a collection of input strings with varying syntax:

from pyfathom import *

in_strs = [
  '180g | 1 cup uncooked brown rice',
  '½ small butternut squash , cubed',
  '5½ tablespoons tahini (you can sub cashew butter)',
  'pecans 125g',
  'flat-leaf parsley a bunch, roughly chopped',
  'rocket 70g',
  'leftover marinade from the mushrooms',
  '15 oz (425 g) black beans, drained (reserve ¼ cup (60 ml) of the juice) and rinsed well',
  '1/4 teaspoon Garam Masala, for garnish',
  '2 tablespoons chopped cilantro, for garnish'
]

and a set of "knowledge" rules defining what is known about the inputs, e.g.:

knowledge = '''
/pinch/ is unit
/mls?|mL|cc|millilitres?|milliliters?/ is unit
/tsps?|t|teaspoons?/ is unit
/tbsps?|Tbsps?|T|tbl|tbs|tablespoons?/ is unit
/floz/ is unit
/fl/,/oz/ is unit
/fluid/,/ounces?/ is unit
/p|pts?|pints?/ is unit
/ls?|L|litres?|liters?/ is unit
/gals?|gallons?/ is unit
/dls?|dL|decilitre|deciliter/ is unit
/gs?|grams?|grammes?/ is unit
/oz|ounces?/ is unit
/lbs?|#|pounds?/ is unit
/kgs?|kilos?|kilograms?/ is unit
/\d+/?,/\d+\/\d+/ is number
/\d+(\.\d+)?/ is number
/\d*[½⅓⅔¼¾⅕⅖⅗⅘⅙⅚⅛⅜⅝⅞]/ is number
/a/ is number-word
number,/-|–/,number is range
/cups?/ is unit
range|number|number-word,/\-/?,unit?,/\./?,/of/? is amount
amount?,/plus/?,amount?,/[a-zA-Z\-]+/+,amount? is ,,,ingredient,
'''

PyFathom attempts to label each part of the string with a type name:

cls = classifier(knowledge)
for in_str in in_strs:
  print(cls.classify(in_str))

Output:

<amount><number>180</number><unit>g</unit></amount>|<amount><number>1</number><unit>cup</unit></amount><ingredient>uncooked brown rice</ingredient>
<number><amount>½</amount></number><ingredient>small butternut squash</ingredient>,<ingredient>cubed</ingredient>
<amount><number>5½</number><unit>tablespoons</unit></amount><ingredient>tahini</ingredient>(<ingredient>you can sub cashew butter</ingredient>)
<ingredient>pecans</ingredient><amount><number>125</number><unit>g</unit></amount>
<ingredient>flat-leaf parsley<number-word><amount>a</amount></number-word>bunch</ingredient>,<ingredient>roughly chopped</ingredient>
<ingredient>rocket</ingredient><amount><number>70</number><unit>g</unit></amount>
<ingredient>leftover marinade from the mushrooms</ingredient>
<amount><number>15</number><unit>oz</unit></amount>(<amount><number>425</number><unit>g</unit></amount>)<ingredient>black beans</ingredient>,<ingredient>drained</ingredient>(<ingredient>reserve</ingredient><amount><number>¼</number><unit>cup</unit></amount>(<amount><number>60</number><unit>ml</unit></amount>)<ingredient>of the juice</ingredient>)<ingredient>and rinsed well</ingredient>
<number><amount>1</amount></number>/<amount><number>4</number><unit>teaspoon</unit></amount><ingredient>Garam Masala</ingredient>,<ingredient>for garnish</ingredient>
<amount><number>2</number><unit>tablespoons</unit></amount><ingredient>chopped cilantro</ingredient>,<ingredient>for garnish</ingredient>

and can extract the parts of a particular type, e.g. ingredient:

for in_str in in_strs:
  print(cls.classify(in_str).extract_typed('ingredient')[0])

Output:

uncooked brown rice
small butternut squash
tahini
pecans
flat-leaf parsley a bunch
rocket
leftover marinade from the mushrooms
black beans
Garam Masala
chopped cilantro

Release notes

0.0.2

  • Lazy matcher
  • Bug fixes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfathom-0.0.2.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyfathom-0.0.2-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file pyfathom-0.0.2.tar.gz.

File metadata

  • Download URL: pyfathom-0.0.2.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4

File hashes

Hashes for pyfathom-0.0.2.tar.gz
Algorithm Hash digest
SHA256 8d95824ab7a23a38735690cf402bd2a19ed5c9cbf6af50382beb7b01dc0ac6d5
MD5 e51ecd803d41738d9e0e6a911f3f871c
BLAKE2b-256 7985c19f08beff7f5bf69fc0a969faf3926eebc1935e63c51cb48673a8c68530

See more details on using hashes here.

File details

Details for the file pyfathom-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: pyfathom-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4

File hashes

Hashes for pyfathom-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5068c607c85f2b002fed744fdbb835788637e46e4b100e7589ef8e49e5be6add
MD5 1101779d5cb9ae0d79454d85fe55b0ae
BLAKE2b-256 132b786c410b81f38a3566c17d97f233d4828955ae0087c23f3c0f3b8111dcb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page