This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

rdc.etl is an Extract Transform Load (ETL) toolkit for python 2.7+ (no python 3 support yet). It gives you all the tools needed to create complex data integration jobs from simple atomic io-connected transformation blocks.

Documentation

Full documentation for rdc.etl on ReadTheDocs.

This is a work in progress, the 1.0 API may change until 1.0 is released.

Example usage

>>> # Sample data extract transformation.
>>> # Use hardcoded data here for sample purpose.
>>> from rdc.etl.transform.extract import Extract
>>> @Extract
... def sample_extract():
...     yield {'first_name': 'John', 'last_name': 'Doe', }
...     yield {'first_name': 'Jane', 'last_name': 'Dae', }
>>> # Sample data transformation.
>>> from rdc.etl.transform import Transform
>>> @Transform
... def sample_transform(hash, channel):
...     hash['last_name'] = hash['last_name'].upper()
...     hash['initials'] = '{0}.{1}.'.format(hash['first_name'][0], hash['last_name'][0]).upper()
...     yield hash
>>> # Sample load. This is only a screen log for sample purpose.
>>> from rdc.etl.transform.util import Log
>>> sample_load_to_screen = Log()
>>> # Tie everything together, then run!
>>> from rdc.etl.harness.threaded import ThreadedHarness
>>> job = ThreadedHarness()
>>> job.add_chain(sample_extract, sample_transform, sample_load_to_screen)
>>> job()

Running the Test Suite

pip install nose
make test

Release Notes

1.0.0a6

  • Database transformations: now present in rdc.etl.extra subpackages, to avoid mixing core API and sugar box.
  • Database transformations: more flexibility in what is allowed (insert/update).
  • Better standard compliance (thanks #python)
  • Harness is now called Job, for simplicity sake. Old name will be kept too (BC).
  • XMLMap enhancements.
  • HTTP status interface (early minimalistic version).
  • Changed examples name to avoid import hell.
  • Less hackish http reader, better unicode support (@jmorel).
  • PasteScript can now be used to generate an empty working ETL project.
  • FileProxy based download manager.
  • Minor fixes.

1.0.0a5

  • Status: console now has amazing ansi, detailed io statistics, overall stats (memory, time) added, experimental http status, db stats for database load.
  • API stabilization, cleanup and simplification towards 1.0.0.
  • Simple handling of KeyboardInterrupt: CTRL-C will now exit the running job instead of making your process stale.
  • Maps simplification.
  • Enhancements to various transform classes: load.database.DatabaseLoad, filter.Filter, map.xml.XmlMap, util.Log, join.database.DatabaseJoin
  • New transforms: util.Limit
  • Various bugfixes.
  • Minor enhancements: custom names in transforms, some more tests.
  • Moved repository to github.com/rdcli/etl

Contributing

I’m Romain Dorgueil.

rdc.etl is on GitHub.

Get in touch, via GitHub or otherwise, if you’ve got something to contribute, it’d be most welcome!

If you feel overwhelmingly grateful, or want to support the project you can tip me on Gittip.

Release History

Release History

1.0.0a6

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0a5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0a4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0a3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0a2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0.0a1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
rdc.etl-1.0.0a6.tar.gz (37.1 kB) Copy SHA256 Checksum SHA256 Source Mar 6, 2014

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting