Lightweight framework for building ETL pipelines.
Project description
WaterGrid-Python
Watergrid is a lightweight, distributed framework for data stream processing.
Why Watergrid
- Watergrid lets developers write their ETL pipelines as applications, not scripts or jobs. This lets you re-use your existing CI/CD infrastructure and deployment practices.
- Watergrid encourages you to write your ETL operations as modular "steps", making it easy to isolate and test atomic parts of your pipelines.
- Watergrid lets you scale up to multi-node clusters by changing only a few lines of Python code.
- Watergrid is minimalistic, and easy to use.
- Watergrid does not depend on complicated software setups that execute jobs. Everything is self-contained in the library itself.
- Watergrid lets you use your existing Redis infrastructure for distributed jobs instead of a proprietary data storage/transmission solution.
- Watergrid includes an API for interfacing with an APM of your choice out of the box.
Getting Started
Creating an ETL pipeline with Watergrid is very easy.
- Install Python 3.6 or later (other versions may be supported, but are not tested regularly).
- Run
pip install watergrid
- Paste the following code into a file named
main.py
:
from watergrid.pipelines.standalone_pipeline import StandalonePipeline
from watergrid.steps import Step
from watergrid.context import DataContext
class SampleStep(Step):
def __init__(self):
super().__init__(self.__class__.__name__)
def run(self, context: DataContext):
print("Hello World!")
def main():
pipeline = StandalonePipeline('hello_world_pipeline')
pipeline.add_step(SampleStep())
while True:
pipeline.run()
if __name__ == '__main__':
main()
Then run python main.py
to run the pipeline. You should see Hello World!
printed to the console.
Check out the getting started section of the documentation site to build more advanced pipelines with multiple steps and high-availability.
Getting Help
There are plenty of places to get help with Watergrid.
- Check the in-line documentation in the Python source.
- Read the online documentation on GitHub Pages.
- Make a post in the Discussions Tab.
- Open an issue if you think the problem is with Watergrid, or if you have a feature to suggest.
Example Projects
- RSSMQ - Forwards RSS feed items to various HTTP APIs.
- atc-metrics-streamer - Streams metrics from Apache Traffic Control to Kafka.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
watergrid-1.1.1.tar.gz
(43.7 kB
view details)
Built Distribution
watergrid-1.1.1-py3-none-any.whl
(34.6 kB
view details)
File details
Details for the file watergrid-1.1.1.tar.gz
.
File metadata
- Download URL: watergrid-1.1.1.tar.gz
- Upload date:
- Size: 43.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91f46d0bca650986fef5204f1aeb6247ccbbb8506ef7f9332cdffa2a29f69835 |
|
MD5 | a369043e0953905f4a61f08c9cb5e782 |
|
BLAKE2b-256 | 5992100db8e58bd67c17006715ec827c649fb7f55d931ed183cdddaa3703cc72 |
File details
Details for the file watergrid-1.1.1-py3-none-any.whl
.
File metadata
- Download URL: watergrid-1.1.1-py3-none-any.whl
- Upload date:
- Size: 34.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.9.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65182b826781644dbb1ad86e75e5576aa81a28d8e9fca24c20259323f0d59df8 |
|
MD5 | 171cb901fd4a6272d0027f71a06295ad |
|
BLAKE2b-256 | 23882275b665d4c1f984a4e61c859ee25bb79b07787caf0f50c78dbb3c37b7b4 |