Parse units from strings. From mess to order!
Project description
Unit Parse (unit_parse)
Do you have strings/text that you want to turn into quantities?
Are you trying to clean scientific data you extracted from Wikipida or some other sketchy website?
Try 'Unit_Parse' to clean everything up for you!
Description:
'Unit_Parse' is built on top of Pint. It was specifically designed to handle data that was extracted from scientific work. It has been rigorously tested against chemistry data extracted from Wikipida (example: styrene; density, melting point, boiling point, etc.) and data from PubChem (example: styrene ; density, melting point, flash point, etc.).
Installation
pip install unit_parse
Dependencies
Pint - Provides unit conversions of cleaned and parsed quantities.
Usage
Basics
Pass string you want to parse to parser()
.
from unit_parse import parser
result = parser("37.34 kJ/mole (at 25 °C)")
print(result)
Logging
The logger can be used to track the parsing steps.
Default level is warning.
warning: will only let you know if there is any text that is being ignored in the parsing process. info: will show the major parsing steps. debug: will show fine grain parsing steps.
Example: INFO
Code:
import logging
from unit_parse import parser, logger
logger.setLevel(logging.INFO)
result = parser("37.34 kJ/mole (at 25 °C)")
print(result)
Output:
INPUT: 37.34 kJ/mole (at 25 °C)
substitution: ('37.34 kJ/mole (at 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
multiple_quantities_main: ('37.34 kJ/mole ( @ 25 °C)',) --> [['37.34 kJ/mole', '', '25 °C']]
text_list_to_quantity: ([['37.34 kJ/mole', '', '25 °C']],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
remove_duplicates: ([[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
OUTPUT: [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
Example: DEBUG
Code:
import logging
from unit_parse import parser, logger
logger.setLevel(logging.DEBUG)
result = parser("37.34 kJ/mole (at 25 °C)")
print(result) # [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>] or [37.34 kJ/mole, 25 °C]
Output:
INPUT: 37.34 kJ/mole (at 25 °C)
sub_general: ('37.34 kJ/mole (at 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
sub_power: ('37.34 kJ/mole ( @ 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
sub_sci_notation: ('37.34 kJ/mole ( @ 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
reduce_ranges: ('37.34 kJ/mole ( @ 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
substitution: ('37.34 kJ/mole (at 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
multiple_quantities: ('37.34 kJ/mole ( @ 25 °C)',) --> ['37.34 kJ/mole ( @ 25 °C)']
reduce_parenthesis: ('37.34 kJ/mole ( @ 25 °C)',) --> ['37.34 kJ/mole ', ' @ 25 °C']
condition_finder: ('37.34 kJ/mole ( @ 25 °C)',) --> ['37.34 kJ/mole', '', '25 °C']
multiple_quantities_main: ('37.34 kJ/mole ( @ 25 °C)',) --> [['37.34 kJ/mole', '', '25 °C']]
get_quantity_and_cond: (['37.34 kJ/mole', '', '25 °C'],) --> [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
text_list_to_quantity: ([['37.34 kJ/mole', '', '25 °C']],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
remove_duplicates: ([[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
OUTPUT: [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
Examples
Yep, there's alot of them!
# Simple conversions
5 -> 5 dimensionless
5 g -> 5 gram
5 g/ml -> 5.0 gram / milliliter
1 K -> 1 kelvin
# stuff
Notes
Pint UnitRegistry
Pint's requires a Unit Registry to be defined. However, Unit Registries are not interoperable and will throw errors if a unit from one registry is used in another. Unit_Parse will go looking to see if one has been created, and if it hasn't we will make one!
So if your project uses Pint already, make sure you import Pint and define the UnitRegistry before importing unit_parse.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for unit_parse-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f867ced78f8ab3be140fcf21cf41362d4f276dec6df4e7b0ee03b9ab2184a88e |
|
MD5 | 32afa11c2e74e541fc7ed545c43eb1f1 |
|
BLAKE2b-256 | b76dbd2db411f2948565c96d813cbb79585ad38c1d3af6c498925e3a26dde6e6 |