Skip to main content

Extract/Transform Light - a simple library for reading delimited files.

Project description

Build Status

ETlite

Extract/Transform Light - a simple library for reading delimited files.

Example

Given CSV file:

Area id,Male,Female,Area
A12345,34,45,0.25
A12346,108,99,0.32

Define a list of transformation:

transformations = [
    # Map existing fields into dictionary.
    # For nested dictionaries use dot.delimited.keys.
    # Optional "via" parameter takes a callable returning transformed value.
    { "from": "Area id", "to": "id" },
    { "from": "Male", "to": "population.male", "via": int },
    { "from": "Female", "to": "population.female", "via": int },
    { "from": "Area", "to": "area", "via": float },

    # You can also add computed values, not present in the original data source.
    # Computer values take transformed dictionary as argument
    # and they do not require "from" parameter:
    {
        "to": "population.total",
        "via": lambda x: x['population']['male'] + x['population']['female']
    },
    # Note that transformations are executed in the order they were defined.
    # This transformation uses population.total value computed in the previous step:
    {
        "to": 'population.density',
        "via": lambda x: round(x['population']['total'] / x['area']),
    }
]

Read the file:

from etlite import delim_reader

with open("mydatafile.csv") as csvfile:
  reader = delim_reader(csvfile, transformations)
  data = [row for row in reader]

This produces a list of dictionaries:

[
    {
        'id': 'A12345',
        'area': 0.25,
        'population': {
            'male': 34,
            'female': 45,
            'total': 79,
            'density': 316
        }
    },
    {
        'id': 'A12346',
        'area': 0.32,
        'population': {
            'male': 108,
            'female': 99,
            'total': 207,
            'density': 647
        }
    }
]

delim_reader options

ETlite is just a thin wrapper on top of Python built-in CSV module. Thus you can pass to delim_reader same options as you would pass to csv.reader. For example:

reader = delim_reader(csvfile, transformations, delimiter="\t")

Exception handling

If desired transtormation cannot be performed, ETLite will raise TransformationError. If you do not want to abort data loading, you can pass an error handler to delim_reader.

Error handler must be a function. It will be passed an instance of TransformationError. Note: on_error must be pased as keywod argument.

from etlite import delim_reader

transformations = [
    # ...
]

def error_handler(err):
    # err is an instance of TransformationError
    print(err) # prints error message
    print(err.record) # prints raw record, prior to transformation


with open('my-data.csv') as stream:
    reader = delim_reader(stream, transformations, on_error=error_handler)
    for row in reader:
        do_something(row)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etlite-0.1.1.tar.gz (3.2 kB view details)

Uploaded Source

Built Distribution

etlite-0.1.1-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file etlite-0.1.1.tar.gz.

File metadata

  • Download URL: etlite-0.1.1.tar.gz
  • Upload date:
  • Size: 3.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.28.0 CPython/3.7.0

File hashes

Hashes for etlite-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c90213acb439d8324cc585bf8247b9cffae33c3d2068a6bc9441558c2aff3a0a
MD5 c4fc51dc16e4ab5723b29963b0a819b8
BLAKE2b-256 5cc77c84d8ba557bd89a2606a7ace646eaf6173c353487f9ce33cb378aa7d943

See more details on using hashes here.

File details

Details for the file etlite-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: etlite-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.28.0 CPython/3.7.0

File hashes

Hashes for etlite-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 007cefc7eb58615933a0f92a0dfbae1352eb3bcc94d6c29fc3efed0a5d3c14b8
MD5 94faed0dbf38f92e62b17a8be3b94e3c
BLAKE2b-256 18e4096d5f097a4968f6fd263afe1e313c73ca1245031681a78164b995009582

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page