Skip to main content

Extract/Transform Light - a simple library for reading delimited files.

Project description

Build Status

ETlite

Extract/Transform Light - a simple library for reading delimited files.

Example

Given CSV file:

Area id,Male,Female,Area
A12345,34,45,0.25
A12346,108,99,0.32

Define a list of transformation:

transformations = [
    # Map existing fields into dictionary.
    # For nested dictionaries use dot.delimited.keys.
    # Optional "via" parameter takes a callable returning transformed value.
    { "from": "Area id", "to": "id" },
    { "from": "Male", "to": "population.male", "via": int },
    { "from": "Female", "to": "population.female", "via": int },
    { "from": "Area", "to": "area", "via": float },

    # You can also add computed values, not present in the original data source.
    # Computer values take transformed dictionary as argument
    # and they do not require "from" parameter:
    {
        "to": "population.total",
        "via": lambda x: x['population']['male'] + x['population']['female']
    },
    # Note that transformations are executed in the order they were defined.
    # This transformation uses population.total value computed in the previous step:
    {
        "to": 'population.density',
        "via": lambda x: round(x['population']['total'] / x['area']),
    }
]

Read the file:

from etlite import delim_reader

with open("mydatafile.csv") as csvfile:
  reader = delim_reader(csvfile, transformations)
  data = [row for row in reader]

This produces a list of dictionaries:

[
    {
        'id': 'A12345',
        'area': 0.25,
        'population': {
            'male': 34,
            'female': 45,
            'total': 79,
            'density': 316
        }
    },
    {
        'id': 'A12346',
        'area': 0.32,
        'population': {
            'male': 108,
            'female': 99,
            'total': 207,
            'density': 647
        }
    }
]

delim_reader options

ETlite is just a thin wrapper on top of Python built-in CSV module. Thus you can pass to delim_reader same options as you would pass to csv.reader. For example:

reader = delim_reader(csvfile, transformations, delimiter="\t")

Exception handling

If desired transtormation cannot be performed, ETLite will raise TransformationError. If you do not want to abort data loading, you can pass an error handler to delim_reader.

Error handler must be a function. It will be passed an instance of TransformationError. Note: on_error must be pased as keywod argument.

from etlite import delim_reader

transformations = [
    # ...
]

def error_handler(err):
    # err is an instance of TransformationError
    print(err) # prints error message
    print(err.record) # prints raw record, prior to transformation


with open('my-data.csv') as stream:
    reader = delim_reader(stream, transformations, on_error=error_handler)
    for row in reader:
        do_something(row)

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for etlite, version 0.1.1
Filename, size File type Python version Upload date Hashes
Filename, size etlite-0.1.1-py3-none-any.whl (3.6 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size etlite-0.1.1.tar.gz (3.2 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page