Text comprehension library for Python
Project description
pyfathom
Text comprehension library for python
Blog
Follow the development of this project at http://jeremyorme.com
Example
Given a collection of input strings with varying syntax:
from pyfathom import *
in_strs = [
'180g | 1 cup uncooked brown rice',
'½ small butternut squash , cubed',
'5½ tablespoons tahini (you can sub cashew butter)',
'pecans 125g',
'flat-leaf parsley a bunch, roughly chopped',
'rocket 70g',
'leftover marinade from the mushrooms',
'15 oz (425 g) black beans, drained (reserve ¼ cup (60 ml) of the juice) and rinsed well',
'1/4 teaspoon Garam Masala, for garnish',
'2 tablespoons chopped cilantro, for garnish'
]
and a set of "knowledge" rules defining what is known about the inputs, e.g.:
knowledge = '''
/pinch/ is unit
/mls?|mL|cc|millilitres?|milliliters?/ is unit
/tsps?|t|teaspoons?/ is unit
/tbsps?|Tbsps?|T|tbl|tbs|tablespoons?/ is unit
/floz/ is unit
/fl/,/oz/ is unit
/fluid/,/ounces?/ is unit
/p|pts?|pints?/ is unit
/ls?|L|litres?|liters?/ is unit
/gals?|gallons?/ is unit
/dls?|dL|decilitre|deciliter/ is unit
/gs?|grams?|grammes?/ is unit
/oz|ounces?/ is unit
/lbs?|#|pounds?/ is unit
/kgs?|kilos?|kilograms?/ is unit
/\d+/?,/\d+\/\d+/ is number
/\d+(\.\d+)?/ is number
/\d*[½⅓⅔¼¾⅕⅖⅗⅘⅙⅚⅛⅜⅝⅞]/ is number
/a/ is number-word
number,/-|–/,number is range
/cups?/ is unit
range|number|number-word,/\-/?,unit?,/\./?,/of/? is amount
amount?,/plus/?,amount?,/[a-zA-Z\-]+/+,amount? is ,,,ingredient,
'''
PyFathom attempts to label each part of the string with a type name:
cls = classifier(knowledge)
for in_str in in_strs:
print(cls.classify(in_str))
Output:
<amount><number>180</number><unit>g</unit></amount>|<amount><number>1</number><unit>cup</unit></amount><ingredient>uncooked brown rice</ingredient>
<number><amount>½</amount></number><ingredient>small butternut squash</ingredient>,<ingredient>cubed</ingredient>
<amount><number>5½</number><unit>tablespoons</unit></amount><ingredient>tahini</ingredient>(<ingredient>you can sub cashew butter</ingredient>)
<ingredient>pecans</ingredient><amount><number>125</number><unit>g</unit></amount>
<ingredient>flat-leaf parsley<number-word><amount>a</amount></number-word>bunch</ingredient>,<ingredient>roughly chopped</ingredient>
<ingredient>rocket</ingredient><amount><number>70</number><unit>g</unit></amount>
<ingredient>leftover marinade from the mushrooms</ingredient>
<amount><number>15</number><unit>oz</unit></amount>(<amount><number>425</number><unit>g</unit></amount>)<ingredient>black beans</ingredient>,<ingredient>drained</ingredient>(<ingredient>reserve</ingredient><amount><number>¼</number><unit>cup</unit></amount>(<amount><number>60</number><unit>ml</unit></amount>)<ingredient>of the juice</ingredient>)<ingredient>and rinsed well</ingredient>
<number><amount>1</amount></number>/<amount><number>4</number><unit>teaspoon</unit></amount><ingredient>Garam Masala</ingredient>,<ingredient>for garnish</ingredient>
<amount><number>2</number><unit>tablespoons</unit></amount><ingredient>chopped cilantro</ingredient>,<ingredient>for garnish</ingredient>
and can extract the parts of a particular type, e.g. ingredient:
for in_str in in_strs:
print(cls.classify(in_str).extract_typed('ingredient')[0])
Output:
uncooked brown rice
small butternut squash
tahini
pecans
flat-leaf parsley a bunch
rocket
leftover marinade from the mushrooms
black beans
Garam Masala
chopped cilantro
Release notes
0.0.2
- Lazy matcher
- Bug fixes
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyfathom-0.0.2.tar.gz.
File metadata
- Download URL: pyfathom-0.0.2.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d95824ab7a23a38735690cf402bd2a19ed5c9cbf6af50382beb7b01dc0ac6d5
|
|
| MD5 |
e51ecd803d41738d9e0e6a911f3f871c
|
|
| BLAKE2b-256 |
7985c19f08beff7f5bf69fc0a969faf3926eebc1935e63c51cb48673a8c68530
|
File details
Details for the file pyfathom-0.0.2-py3-none-any.whl.
File metadata
- Download URL: pyfathom-0.0.2-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.35.0 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5068c607c85f2b002fed744fdbb835788637e46e4b100e7589ef8e49e5be6add
|
|
| MD5 |
1101779d5cb9ae0d79454d85fe55b0ae
|
|
| BLAKE2b-256 |
132b786c410b81f38a3566c17d97f233d4828955ae0087c23f3c0f3b8111dcb8
|