Parse units from strings. From mess to order!
Project description
Unit Parse (unit_parse)
Do you have strings/text that you want to turn into quantities?
Are you trying to clean scientific data you extracted from Wikipida or some other sketchy website?
Try 'Unit_Parse' to clean everything up for you!
Description:
'Unit_Parse' is built on top of Pint. It was specifically designed to handle data that was extracted from scientific work. It has been rigorously tested against chemistry data extracted from Wikipida (example: styrene; density, melting point, boiling point, etc.) and data from PubChem (example: styrene ; density, melting point, flash point, etc.).
Installation
pip install unit_parse
Dependencies
Pint - Provides unit conversions of cleaned and parsed quantities.
Usage
Basics
Pass string you want to parse to parser()
.
from unit_parse import parser
result = parser("37.34 kJ/mole (at 25 °C)")
print(result)
Logging
The logger can be used to track the parsing steps.
Default level is warning.
warning: will only let you know if there is any text that is being ignored in the parsing process. info: will show the major parsing steps. debug: will show fine grain parsing steps.
Example: INFO
Code:
import logging
from unit_parse import parser, logger
logger.setLevel(logging.INFO)
result = parser("37.34 kJ/mole (at 25 °C)")
print(result)
Output:
INPUT: 37.34 kJ/mole (at 25 °C)
substitution: ('37.34 kJ/mole (at 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
multiple_quantities_main: ('37.34 kJ/mole ( @ 25 °C)',) --> [['37.34 kJ/mole', '', '25 °C']]
text_list_to_quantity: ([['37.34 kJ/mole', '', '25 °C']],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
remove_duplicates: ([[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
OUTPUT: [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
Example: DEBUG
Code:
import logging
from unit_parse import parser, logger
logger.setLevel(logging.DEBUG)
result = parser("37.34 kJ/mole (at 25 °C)")
print(result) # [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>] or [37.34 kJ/mole, 25 °C]
Output:
INPUT: 37.34 kJ/mole (at 25 °C)
sub_general: ('37.34 kJ/mole (at 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
sub_power: ('37.34 kJ/mole ( @ 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
sub_sci_notation: ('37.34 kJ/mole ( @ 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
reduce_ranges: ('37.34 kJ/mole ( @ 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
substitution: ('37.34 kJ/mole (at 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
multiple_quantities: ('37.34 kJ/mole ( @ 25 °C)',) --> ['37.34 kJ/mole ( @ 25 °C)']
reduce_parenthesis: ('37.34 kJ/mole ( @ 25 °C)',) --> ['37.34 kJ/mole ', ' @ 25 °C']
condition_finder: ('37.34 kJ/mole ( @ 25 °C)',) --> ['37.34 kJ/mole', '', '25 °C']
multiple_quantities_main: ('37.34 kJ/mole ( @ 25 °C)',) --> [['37.34 kJ/mole', '', '25 °C']]
get_quantity_and_cond: (['37.34 kJ/mole', '', '25 °C'],) --> [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
text_list_to_quantity: ([['37.34 kJ/mole', '', '25 °C']],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
remove_duplicates: ([[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
OUTPUT: [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
Examples
Yep, there's alot of them!
# Simple conversions
5 -> 5 dimensionless
5 g -> 5 gram
5 g/ml -> 5.0 gram / milliliter
1 K -> 1 kelvin
# stuff
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for unit_parse-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 670618f65d817b3cc74690981920270fec07c186adb733493d60cacd03b3331e |
|
MD5 | e12e2e630d67032f645f1ce90b84b0e1 |
|
BLAKE2b-256 | 33334a739d30dd8f2011627c95a53ec81b2894d83b6be726b8d4de7e068e79c3 |