Skip to main content

Parse units from strings. From mess to order!

Project description

Unit Parse (unit_parse)



PyPI tests coverage flake8 downloads license

Do you have strings/text that you want to turn into quantities?

Are you trying to clean scientific data you extracted from Wikipida or some other sketchy website?

Try 'Unit_Parse' to clean everything up for you!

Description:

'Unit_Parse' is built on top of Pint. It was specifically designed to handle data that was extracted from scientific work. It has been rigorously tested against chemistry data extracted from Wikipida (example: styrene; density, melting point, boiling point, etc.) and data from PubChem (example: styrene ; density, melting point, flash point, etc.).


Installation

pip install unit_parse

Dependencies

Pint - Provides unit conversions of cleaned and parsed quantities.



Usage

Basics

Pass string you want to parse to parser().

from unit_parse import parser

result = parser("37.34 kJ/mole (at 25 °C)")
print(result)

Logging

The logger can be used to track the parsing steps.

Default level is warning.

warning: will only let you know if there is any text that is being ignored in the parsing process. info: will show the major parsing steps. debug: will show fine grain parsing steps.

Example: INFO

Code:

import logging

from unit_parse import parser, logger

logger.setLevel(logging.INFO)

result = parser("37.34 kJ/mole (at 25 °C)")
print(result)

Output:

    INPUT: 37.34 kJ/mole (at 25 °C)
    substitution: ('37.34 kJ/mole (at 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
    multiple_quantities_main: ('37.34 kJ/mole ( @ 25 °C)',) --> [['37.34 kJ/mole', '', '25 °C']]
    text_list_to_quantity: ([['37.34 kJ/mole', '', '25 °C']],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
    remove_duplicates: ([[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
    OUTPUT: [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]

Example: DEBUG

Code:

import logging

from unit_parse import parser, logger

logger.setLevel(logging.DEBUG)

result = parser("37.34 kJ/mole (at 25 °C)")
print(result)  # [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>] or [37.34 kJ/mole, 25 °C]

Output:

    INPUT: 37.34 kJ/mole (at 25 °C)
        sub_general: ('37.34 kJ/mole (at 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
        sub_power: ('37.34 kJ/mole ( @ 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
        sub_sci_notation: ('37.34 kJ/mole ( @ 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
        reduce_ranges: ('37.34 kJ/mole ( @ 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
    substitution: ('37.34 kJ/mole (at 25 °C)',) --> 37.34 kJ/mole ( @ 25 °C)
        multiple_quantities: ('37.34 kJ/mole ( @ 25 °C)',) --> ['37.34 kJ/mole ( @ 25 °C)']
        reduce_parenthesis: ('37.34 kJ/mole ( @ 25 °C)',) --> ['37.34 kJ/mole ', ' @ 25 °C']
        condition_finder: ('37.34 kJ/mole ( @ 25 °C)',) --> ['37.34 kJ/mole', '', '25 °C']
    multiple_quantities_main: ('37.34 kJ/mole ( @ 25 °C)',) --> [['37.34 kJ/mole', '', '25 °C']]
        get_quantity_and_cond: (['37.34 kJ/mole', '', '25 °C'],) --> [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
    text_list_to_quantity: ([['37.34 kJ/mole', '', '25 °C']],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
    remove_duplicates: ([[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]],) --> [[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]]
    OUTPUT: [<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]
[<Quantity(37.34, 'kilojoule / mole')>, <Quantity(25, 'degree_Celsius')>]


Examples

Yep, there's alot of them!

    # Simple conversions
    5 -> 5 dimensionless
    5 g -> 5 gram
    5 g/ml -> 5.0 gram / milliliter
    1 K -> 1 kelvin

    # stuff

Notes

Pint UnitRegistry

Pint's requires a Unit Registry to be defined. However, Unit Registries are not interoperable and will throw errors if a unit from one registry is used in another. Unit_Parse will go looking to see if one has been created, and if it hasn't we will make one!

So if your project uses Pint already, make sure you import Pint and define the UnitRegistry before importing unit_parse.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unit_parse-0.0.5.tar.gz (1.1 MB view hashes)

Uploaded Source

Built Distribution

unit_parse-0.0.5-py3-none-any.whl (1.1 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page