Build modular data pipelines running inside the postgres database
Project description
Ralsei
Ralsei is a Python framework for building modular data pipelines running inside the postgres database.
It was built with use cases such as web scraping in mind, with the philosophy that all artifacts should be stored in the database: from downloaded html to the parsed results
Features
- Based on the jinja-psycopg library, combining type-safe SQL formatting with jinja's template language
- Declarative with minimal boilerplate
- Resumable pipelines, both at row-level and table-level granularity - no need to re-compute or re-download what has already been processed
Installation
pip install ralsei
Tip: consider using PDM or Poetry for project-based dependency management
Quick Start
First, create a script from the following template:
from ralsei import RalseiCli
def make_pipeline(args):
return {} # Declare your pipeline in the format "name": Task(...)
if __name__ == "__main__":
cli = RalseiCli() # Here you can add custom arguments
cli.run(make_pipeline)
To see some example pipelines, take a look at the Builtin Tasks section of the documentation
Alternatives
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ralsei-2.1.2.tar.gz
(19.4 kB
view hashes)
Built Distribution
ralsei-2.1.2-py3-none-any.whl
(24.4 kB
view hashes)