Parse units from strings. From mess to order!
Project description
Unit Parse (unit_parse)
Do you have strings/text that you want to turn into quantities?
Are you trying to clean scientific data you extracted from Wikipida or some other sketchy website?
Try 'Unit_Parse' to clean everything up for you!
Description:
'Unit_Parse' is built on top of Pint. It was specifically designed to handle data that was extracted from scientific work. It has been rigorously tested against chemistry data extracted from Wikipida (example: styrene; density, melting point, boiling point, etc.) and data from PubChem (example: styrene ; density, melting point, flash point, etc.).
Installation
pip install unit_parse
Dependencies
Pint - Provides unit conversions of cleaned and parsed quantities.
Usage
Basics
from unit_parse import parser
result = parser("37.34 kJ/mole (at 25 °C)")
print(result)
Logging
The logger can be used to track how a text that through the parsing steps. Default level is warning.
warning: will only let you know if there is any text that is being ignored in the parsing process. info: will show the major parsing steps. debug: will show fine grain parsing steps.
Example INFO
Code:
import logging
from unit_parse import parser, logger
logger.setLevel(logging.INFO)
result = parser("37.34 kJ/mole (at 25 °C)")
print(result)
Output:
INPUT: 37.34 kJ/mole (at 25 °C)
substitution: ('37.34 kJ/mole (at 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
multiple_quantities_main: ('37.34 kJ/mole ( @ 25 °C)',) --> [['37.34 kJ/mole', '', '25 °C']]
text_list_to_quantity: ([['37.34 kJ/mole', '', '25 °C']],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
remove_duplicates: ([[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
OUTPUT: [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
Example DEBUG
Code:
import logging
from unit_parse import parser, logger
logger.setLevel(logging.DEBUG)
result = parser("37.34 kJ/mole (at 25 °C)")
print(result) # [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>] or [37.34 kJ/mole, 25 °C]
Output:
INPUT: 37.34 kJ/mole (at 25 °C)
sub_general: ('37.34 kJ/mole (at 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
sub_power: ('37.34 kJ/mole ( @ 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
sub_sci_notation: ('37.34 kJ/mole ( @ 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
reduce_ranges: ('37.34 kJ/mole ( @ 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
substitution: ('37.34 kJ/mole (at 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
multiple_quantities: ('37.34 kJ/mole ( @ 25 °C)',) --> ['37.34 kJ/mole ( @ 25 °C)']
reduce_parenthesis: ('37.34 kJ/mole ( @ 25 °C)',) --> ['37.34 kJ/mole ', ' @ 25 °C']
condition_finder: ('37.34 kJ/mole ( @ 25 °C)',) --> ['37.34 kJ/mole', '', '25 °C']
multiple_quantities_main: ('37.34 kJ/mole ( @ 25 °C)',) --> [['37.34 kJ/mole', '', '25 °C']]
get_quantity_and_cond: (['37.34 kJ/mole', '', '25 °C'],) --> [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
text_list_to_quantity: ([['37.34 kJ/mole', '', '25 °C']],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
remove_duplicates: ([[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
OUTPUT: [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
Examples
Yep, there's alot of them!
# Simple conversions
5 -> 5 dimensionless
5 g -> 5 gram
5 g/ml -> 5.0 gram / milliliter
1 K -> 1 kelvin
# stuff
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for unit_parse-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 921e1c4c8b03f73d1bf231f86b0322f1d57252174f5b79f51574fa6a46719bb3 |
|
MD5 | 65fcd63f70728878111a3b8bfefdae81 |
|
BLAKE2b-256 | d2de5104f2c7d9f804aaf774efdf41f6b4da66cb63b719831d5b0fd2345f7cd0 |