Skip to main content

A general purpose python ETL/pipeline utility library, for use especially with Hive Streaming.

Project description

transformpy is a Python 2/3 module for doing transforms on “streams” of data. The transforms can be applied to any python iterable object, and so can be used for continuous real_time streams or static streams (such as from a file). It is designed in such a manner that it uses very little memory (unless necessary by clustering and/or aggregation routines). It was originally designed to allow python transformations (maps and reductions) of data stored within HIVE, using the Hadoop streaming paradigm.

NOTE: TransformPy is not guaranteed to be API stable before version 1.0; but changes should be small if any to the current version.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformpy-0.3.1.tar.gz (5.7 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page