Skip to main content

Lightweight framework for building ETL pipelines.

Project description

WaterGrid-Python

WaterGrid Tests Maintainability Test Coverage PyPI License

Watergrid is a lightweight, distributed framework for data stream processing.

Why Watergrid

  • Watergrid lets developers write their ETL pipelines as applications, not scripts or jobs. This lets you re-use your existing CI/CD infrastructure and deployment practices.
  • Watergrid encourages you to write your ETL operations as modular "steps", making it easy to isolate and test atomic parts of your pipelines.
  • Watergrid lets you scale up to multi-node clusters by changing only a few lines of Python code.
  • Watergrid is minimalistic, and easy to use.
  • Watergrid does not depend on complicated software setups that execute jobs. Everything is self-contained in the library itself.
  • Watergrid lets you use your existing Redis infrastructure for distributed jobs instead of a proprietary data storage/transmission solution.
  • Watergrid includes an API for interfacing with an APM of your choice out of the box.

Getting Started

Creating an ETL pipeline with Watergrid is very easy.

  1. Install Python 3.6 or later (other versions may be supported, but are not tested regularly).
  2. Run pip install watergrid
  3. Paste the following code into a file named main.py:
from watergrid.pipelines.standalone_pipeline import StandalonePipeline
from watergrid.steps import Step
from watergrid.context import DataContext

class SampleStep(Step):
    def __init__(self):
        super().__init__(self.__class__.__name__)

    def run(self, context: DataContext):
        print("Hello World!")

def main():
   pipeline = StandalonePipeline('hello_world_pipeline')
   pipeline.add_step(SampleStep())
   while True:
    pipeline.run()

if __name__ == '__main__':
   main()

Then run python main.py to run the pipeline. You should see Hello World! printed to the console.

Check out the getting started section of the documentation site to build more advanced pipelines with multiple steps and high-availability.

Getting Help

There are plenty of places to get help with Watergrid.

  • Check the in-line documentation in the Python source.
  • Read the online documentation on GitHub Pages.
  • Make a post in the Discussions Tab.
  • Open an issue if you think the problem is with Watergrid, or if you have a feature to suggest.

Example Projects

  • RSSMQ - Forwards RSS feed items to various HTTP APIs.
  • atc-metrics-streamer - Streams metrics from Apache Traffic Control to Kafka.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

watergrid-1.1.1.tar.gz (43.7 kB view details)

Uploaded Source

Built Distribution

watergrid-1.1.1-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file watergrid-1.1.1.tar.gz.

File metadata

  • Download URL: watergrid-1.1.1.tar.gz
  • Upload date:
  • Size: 43.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for watergrid-1.1.1.tar.gz
Algorithm Hash digest
SHA256 91f46d0bca650986fef5204f1aeb6247ccbbb8506ef7f9332cdffa2a29f69835
MD5 a369043e0953905f4a61f08c9cb5e782
BLAKE2b-256 5992100db8e58bd67c17006715ec827c649fb7f55d931ed183cdddaa3703cc72

See more details on using hashes here.

File details

Details for the file watergrid-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: watergrid-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for watergrid-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 65182b826781644dbb1ad86e75e5576aa81a28d8e9fca24c20259323f0d59df8
MD5 171cb901fd4a6272d0027f71a06295ad
BLAKE2b-256 23882275b665d4c1f984a4e61c859ee25bb79b07787caf0f50c78dbb3c37b7b4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page