Skip to main content

Extract Transform Load (ETL) toolkit for python

Project description

rdc.etl is an Extract Transform Load (ETL) toolkit for python 2.7+ (no python 3 support yet). It gives you all the tools needed to create complex data integration jobs from simple atomic io-connected transformation blocks.

Build Status

Documentation

Full documentation for rdc.etl on ReadTheDocs.

This is a work in progress, the 1.0 API may change until 1.0 is released.

Example usage

>>> # Sample data extract transformation.
>>> # Use hardcoded data here for sample purpose.
>>> from rdc.etl.transform.extract import Extract
>>> @Extract
... def sample_extract():
...     yield {'first_name': 'John', 'last_name': 'Doe', }
...     yield {'first_name': 'Jane', 'last_name': 'Dae', }
>>> # Sample data transformation.
>>> from rdc.etl.transform import Transform
>>> @Transform
... def sample_transform(hash, channel):
...     hash['last_name'] = hash['last_name'].upper()
...     hash['initials'] = '{0}.{1}.'.format(hash['first_name'][0], hash['last_name'][0]).upper()
...     yield hash
>>> # Sample load. This is only a screen log for sample purpose.
>>> from rdc.etl.transform.util import Log
>>> sample_load_to_screen = Log()
>>> # Tie everything together, then run!
>>> from rdc.etl.harness.threaded import ThreadedHarness
>>> job = ThreadedHarness()
>>> job.add_chain(sample_extract, sample_transform, sample_load_to_screen)
>>> job()

Running the Test Suite

pip install nose
make test

Release Notes

1.0.0a5

  • Status: console now has amazing ansi, detailed io statistics, overall stats (memory, time) added, experimental http status, db stats for database load.

  • API stabilization, cleanup and simplification towards 1.0.0.

  • Simple handling of KeyboardInterrupt: CTRL-C will now exit the running job instead of making your process stale.

  • Maps simplification.

  • Enhancements to various transform classes: load.database.DatabaseLoad, filter.Filter, map.xml.XmlMap, util.Log, join.database.DatabaseJoin

  • New transforms: util.Limit

  • Various bugfixes.

  • Minor enhancements: custom names in transforms, some more tests.

  • Moved repository to github.com/rdcli/etl.

Contributing

I’m Romain Dorgueil.

rdc.etl is on GitHub.

Get in touch, via GitHub or otherwise, if you’ve got something to contribute, it’d be most welcome!

If you feel overwhelmingly grateful, or want to support the project you can tip me on Gittip.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdc.etl-1.0.0a5.tar.gz (31.8 kB view details)

Uploaded Source

File details

Details for the file rdc.etl-1.0.0a5.tar.gz.

File metadata

  • Download URL: rdc.etl-1.0.0a5.tar.gz
  • Upload date:
  • Size: 31.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for rdc.etl-1.0.0a5.tar.gz
Algorithm Hash digest
SHA256 c25f981f9ce25af0172b80bba31a713bfbc9cb7c71cb76ef4968ed98b82aa864
MD5 d2a7d07353d8ca6576efe8d4bc2e4e3c
BLAKE2b-256 64951fb5ca7ddc562c8d3019009c69364c62b2690aecbe948c35a5efd86ab5cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page