Skip to main content

Extract/Transform Light - a simple library for reading delimited files.

Project description

Build Status

ETlite

Extract/Transform Light - a simple library for reading delimited files.

Example

Given CSV file:

Area id,Male,Female,Area
A12345,34,45,0.25
A12346,108,99,0.32

Define a list of transformation:

transformations = [
    # Map existing fields into dictionary.
    # For nested dictionaries use dot.delimited.keys.
    # Optional "via" parameter takes a callable returning transformed value.
    { "from": "Area id", "to": "id" },
    { "from": "Male", "to": "population.male", "via": int },
    { "from": "Female", "to": "population.female", "via": int },
    { "from": "Area", "to": "area", "via": float },

    # You can also add computed values, not present in the original data source.
    # Computer values take transformed dictionary as argument
    # and they do not require "from" parameter:
    {
        "to": "population.total",
        "via": lambda x: x['population']['male'] + x['population']['female']
    },
    # Note that transformations are executed in the order they were defined.
    # This transformation uses population.total value computed in the previous step:
    {
        "to": 'population.density',
        "via": lambda x: round(x['population']['total'] / x['area']),
    }
]

Read the file:

from etlite import delim_reader

with open("mydatafile.csv") as csvfile:
  reader = delim_reader(csvfile, transformations)
  data = [row for row in reader]

This produces a list of dictionaries:

[
    {
        'id': 'A12345',
        'area': 0.25,
        'population': {
            'male': 34,
            'female': 45,
            'total': 79,
            'density': 316
        }
    },
    {
        'id': 'A12346',
        'area': 0.32,
        'population': {
            'male': 108,
            'female': 99,
            'total': 207,
            'density': 647
        }
    }
]

delim_reader options

ETlite is just a thin wrapper on top of Python built-in CSV module. Thus you can pass to delim_reader same options as you would pass to csv.reader. For example:

reader = delim_reader(csvfile, transformations, delimiter="\t")

Exception handling

If desired transtormation cannot be performed, ETLite will raise TransformationError. If you do not want to abort data loading, you can pass an error handler to delim_reader.

Error handler must be a function. It will be passed an instance of TransformationError. Note: on_error must be pased as keywod argument.

from etlite import delim_reader

transformations = [
    # ...
]

def error_handler(err):
    # err is an instance of TransformationError
    print(err) # prints error message
    print(err.record) # prints raw record, prior to transformation


with open('my-data.csv') as stream:
    reader = delim_reader(stream, transformations, on_error=error_handler)
    for row in reader:
        do_something(row)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etlite-0.1.1.tar.gz (3.2 kB view hashes)

Uploaded Source

Built Distribution

etlite-0.1.1-py3-none-any.whl (3.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page