Extract/Transform Light - a simple library for reading delimited files.
Project description
ETlite
Extract/Transform Light - a simple library for reading delimited files.
Example
Given CSV file:
Area id,Male,Female,Area
A12345,34,45,0.25
A12346,108,99,0.32
Define a list of transformation:
transformations = [
# Map existing fields into dictionary.
# For nested dictionaries use dot.delimited.keys.
# Optional "via" parameter takes a callable returning transformed value.
{ "from": "Area id", "to": "id" },
{ "from": "Male", "to": "population.male", "via": int },
{ "from": "Female", "to": "population.female", "via": int },
{ "from": "Area", "to": "area", "via": float },
# You can also add computed values, not present in the original data source.
# Computer values take transformed dictionary as argument
# and they do not require "from" parameter:
{
"to": "population.total",
"via": lambda x: x['population']['male'] + x['population']['female']
},
# Note that transformations are executed in the order they were defined.
# This transformation uses population.total value computed in the previous step:
{
"to": 'population.density',
"via": lambda x: round(x['population']['total'] / x['area']),
}
]
Read the file:
from etlite import delim_reader
with open("mydatafile.csv") as csvfile:
reader = delim_reader(csvfile, transformations)
data = [row for row in reader]
This produces a list of dictionaries:
[
{
'id': 'A12345',
'area': 0.25,
'population': {
'male': 34,
'female': 45,
'total': 79,
'density': 316
}
},
{
'id': 'A12346',
'area': 0.32,
'population': {
'male': 108,
'female': 99,
'total': 207,
'density': 647
}
}
]
delim_reader
options
ETlite is just a thin wrapper on top of Python built-in CSV module. Thus you can pass to delim_reader
same options as you would pass to csv.reader
. For example:
reader = delim_reader(csvfile, transformations, delimiter="\t")
Exception handling
If desired transtormation cannot be performed, ETLite will raise TransformationError
. If you do not want to abort data loading, you can pass an error handler to delim_reader
.
Error handler must be a function. It will be passed an instance of TransformationError
. Note: on_error
must be pased as keywod argument.
from etlite import delim_reader
transformations = [
# ...
]
def error_handler(err):
# err is an instance of TransformationError
print(err) # prints error message
print(err.record) # prints raw record, prior to transformation
with open('my-data.csv') as stream:
reader = delim_reader(stream, transformations, on_error=error_handler)
for row in reader:
do_something(row)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.