Skip to main content

Lightweight framework for building ETL pipelines.

Project description

WaterGrid-Python

WaterGrid Tests Maintainability Test Coverage PyPI License

Watergrid is a lightweight, distributed framework for data stream processing.

Why Watergrid

  • Watergrid lets developers write their ETL pipelines as applications, not scripts or jobs. This lets you re-use your existing CI/CD infrastructure and deployment practices.
  • Watergrid encourages you to write your ETL operations as modular "steps", making it easy to isolate and test atomic parts of your pipelines.
  • Watergrid lets you scale up to multi-node clusters by changing only a few lines of Python code.
  • Watergrid is minimalistic, and easy to use.
  • Watergrid does not depend on complicated software setups that execute jobs. Everything is self-contained in the library itself.
  • Watergrid lets you use your existing Redis infrastructure for distributed jobs instead of a proprietary data storage/transmission solution.
  • Watergrid includes an API for interfacing with an APM of your choice out of the box.

Getting Started

Creating an ETL pipeline with Watergrid is very easy.

  1. Install Python 3.6 or later (other versions may be supported, but are not tested regularly).
  2. Run pip install watergrid
  3. Paste the following code into a file named main.py:
from watergrid.pipelines.standalone_pipeline import StandalonePipeline
from watergrid.steps import Step
from watergrid.context import DataContext

class SampleStep(Step):
    def __init__(self):
        super().__init__(self.__class__.__name__)

    def run(self, context: DataContext):
        print("Hello World!")

def main():
   pipeline = StandalonePipeline('hello_world_pipeline')
   pipeline.add_step(SampleStep())
   while True:
    pipeline.run()

if __name__ == '__main__':
   main()

Then run python main.py to run the pipeline. You should see Hello World! printed to the console.

Check out the getting started section of the documentation site to build more advanced pipelines with multiple steps and high-availability.

Getting Help

There are plenty of places to get help with Watergrid.

  • Check the in-line documentation in the Python source.
  • Read the online documentation on GitHub Pages.
  • Make a post in the Discussions Tab.
  • Open an issue if you think the problem is with Watergrid, or if you have a feature to suggest.

Example Projects

  • RSSMQ - Forwards RSS feed items to various HTTP APIs.
  • atc-metrics-streamer - Streams metrics from Apache Traffic Control to Kafka.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

watergrid-1.1.1.tar.gz (43.7 kB view hashes)

Uploaded Source

Built Distribution

watergrid-1.1.1-py3-none-any.whl (34.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page