Skip to main content

Build modular data pipelines running inside the postgres database

Project description

Ralsei

Ralsei is a Python framework for building modular data pipelines running inside the postgres database.

It was built with use cases such as web scraping in mind, with the philosophy that all artifacts should be stored in the database: from downloaded html to the parsed results

PyPI - Version Docs - Status Tests - Status

Features

  • Based on the jinja-psycopg library, combining type-safe SQL formatting with jinja's template language
  • Declarative with minimal boilerplate
  • Resumable pipelines, both at row-level and table-level granularity - no need to re-compute or re-download what has already been processed

Installation

pip install ralsei

Tip: consider using PDM or Poetry for project-based dependency management

Quick Start

First, create a script from the following template:

from ralsei import RalseiCli

def make_pipeline(args):
    return {} #  Declare your pipeline in the format "name": Task(...)

if __name__ == "__main__":
    cli = RalseiCli() # Here you can add custom arguments
    cli.run(make_pipeline)

To see some example pipelines, take a look at the Builtin Tasks section of the documentation

Alternatives

  • DBT - jinja + SQL based
    more suitable for processing data that you already have
  • Kedro - python based, more suitable for processing data that you already have

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ralsei-3.0.0.dev3.tar.gz (28.2 kB view hashes)

Uploaded Source

Built Distribution

ralsei-3.0.0.dev3-py3-none-any.whl (38.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page