Extract/Transform Light - a simple library for reading delimited files.
Project description
ETlite
Extract/Transform Light - a simple library for reading delimited files.
Example
Given CSV file:
Area id,Male,Female,Area
A12345,34,45,0.25
A12346,108,99,0.32
Define a list of transformation:
transformations = [ # Map existing fields into dictionary. # For nested dictionaries use dot.delimited.keys. # Optional "via" parameter takes a callable returning transformed value. { "from": "Area id", "to": "id" }, { "from": "Male", "to": "population.male", "via": int }, { "from": "Female", "to": "population.female", "via": int }, { "from": "Area", "to": "area", "via": float }, # You can also add computed values, not present in the original data source. # Computer values take transformed dictionary as argument # and they do not require "from" parameter: { "to": "population.total", "via": lambda x: x['population']['male'] + x['population']['female'] }, # Note that transformations are executed in the order they were defined. # This transformation uses population.total value computed in the previous step: { "to": 'population.density', "via": lambda x: round(x['population']['total'] / x['area']), } ]
Read the file:
from etlite import delim_reader with open("mydatafile.csv") as csvfile: reader = delim_reader(csvfile, transformations) data = [row for row in reader]
This produces a list of dictionaries:
[ { 'id': 'A12345', 'area': 0.25, 'population': { 'male': 34, 'female': 45, 'total': 79, 'density': 316 } }, { 'id': 'A12346', 'area': 0.32, 'population': { 'male': 108, 'female': 99, 'total': 207, 'density': 647 } } ]
delim_reader
options
ETlite is just a thin wrapper on top of Python built-in CSV module. Thus you can pass to delim_reader
same options as you would pass to csv.reader
. For example:
reader = delim_reader(csvfile, transformations, delimiter="\t")
Exception handling
If desired transtormation cannot be performed, ETLite will raise TransformationError
. If you do not want to abort data loading, you can pass an error handler to delim_reader
.
Error handler must be a function. It will be passed an instance of TransformationError
. Note: on_error
must be pased as keywod argument.
from etlite import delim_reader transformations = [ # ... ] def error_handler(err): # err is an instance of TransformationError print(err) # prints error message print(err.record) # prints raw record, prior to transformation with open('my-data.csv') as stream: reader = delim_reader(stream, transformations, on_error=error_handler) for row in reader: do_something(row)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.