Skip to main content

Common Library for ReOrc Data Platform

Project description

Recurve Libraries

For the unified maintenance of public components of the Recurve platform, these codes may be used in both Server and Executor (Worker) environments.

Only Python 3.11+ are supported.

Components

This code repository consists of the following core components:

Core

The foundation of the Recurve platform that provides:

  • Base classes and interfaces for platform components
  • Jinja2 templating engine integration
  • Core configuration management
  • Common platform abstractions

Utils

A comprehensive utility library offering:

  • Time handling and date manipulation
  • Concurrent processing tools
  • File system operations and path handling
  • String manipulation and text processing
  • Logging and error handling utilities
  • Data validation helpers

Connectors

A robust data connectivity layer supporting:

  • Database connections (MySQL, PostgreSQL, Redshift, BigQuery, etc.)
  • Cloud storage (S3, GCS, Azure Blob Storage)
  • Messaging services and APIs
  • Custom connector development framework

Note: Run make update-connector-schema after updating connector config schemas to regenerate config_schema.py

Schedulers

Airflow integration components including:

  • Custom Airflow operators and sensors
  • DAG generation utilities
  • Workflow scheduling interfaces
  • Task dependency management

Operators

Task-specific operators for:

  • Data extraction and loading
  • Data transformation and processing
  • Running Python code
  • Running SQL code
  • Building and running DBT jobs

Client

A flexible client interface providing:

  • Platform API abstractions
  • Authentication handling
  • Resource management
  • Extensible base classes for custom clients
  • Connection pooling and retry logic

Executors

Core job execution engine that:

  • Manages job submissions and execution flows
  • Orchestrates task execution on infrastructure
  • Handles job lifecycle and state management
  • Provides infrastructure abstraction layer

Development Workflow

Requirements management

We use uv to manage Python package dependencies. The workflow is:

  1. Update source requirements in .in files:

  2. Compile locked requirements:

    make compile-requirements  # Compiles all requirements files
    

    Or compile individual files:

    make compile-worker  # Just worker requirements
    make compile-dbt    # Just DBT requirements
    
  3. After compiling requirements, update optional dependencies in pyproject.toml:

    make update-optional-deps
    

This ensures consistent dependencies across development and production environments.

Release Process

  1. Update version number in recurvedata/__version__.py
  2. Build and publish package:
    make publish
    
    This will clean build artifacts, build new package, and publish to Recurve PyPI.

Available Commands

The following make commands are available:

Build and Publishing:

  • make clean - Remove build artifacts (dist directory)
  • make build - Clean and build the package
  • make publish - Build and publish package to Recurve PyPI

Requirements Management:

  • make upgrade-uv - Upgrade the uv package installer
  • make compile-worker - Compile worker-specific requirements
  • make compile-dbt - Compile DBT-specific requirements
  • make compile-requirements - Compile all requirements files and sync environment
  • make install-requirements - Install requirements files

Maintenance Scripts:

  • make update-optional-deps - Update optional dependencies in pyproject.toml
  • make update-connector-schema - Update connector configuration schemas

GitLab CI/CD Pipeline

Overview

This repository uses GitLab CI for build and release. The primary stages (in order) are:

  • python_internal: build and publish internal PyPI package (develop)
  • python_public: build and publish public PyPI package (main, manual)
  • copy_dockerfiles: prepare Docker build context and version metadata
  • docker_internal: build and push internal Docker images (develop / release/*)
  • docker_official: build and push public Docker images (main, manual)
  • scan: SAST/Sonar (disabled by default)

Configuration files:

  • .gitlab-ci.yml (stages, rules, job orchestration)
  • .gitlab/ci/templates.yml (shared templates and login configuration)
  • .gitlab/ci/variables.yml (shared variables)
  • dockerfiles/build_push_images.sh (image build and push script)

Branch rules and triggers

  • develop branch

    • Runs: check_versionpackage_python_internalcopy_dockerfilesbuild_docker_internal
    • Environment: ENVIRONMENT=test
    • Targets: internal private registry + Aliyun internal namespace
  • release/* branches

    • Runs: copy_dockerfilesbuild_docker_internal
    • Environment: ENVIRONMENT=staging
    • Targets: internal private registry + Aliyun internal namespace
  • main branch

    • Runs (manual): package_python_public, build_docker_public, tag
    • Environment: ENVIRONMENT=production
    • Targets: Docker Hub public registry + Aliyun public namespace

Note: the tag job creates an annotated tag after the public image build, including deployment metadata.

Image build and naming

Build script: dockerfiles/build_push_images.sh

  • Image names:
    • Production: recurve-<service> (current service = worker)
    • Non-production: recurve-<service>-<environment> (e.g., recurve-worker-test, recurve-worker-staging)
  • Tags:
    • version: ${VERSION_PACKAGE} (from recurvedata/__version__.py)
    • latest

Push targets

  • Internal private registry (login handled by job type)

    • Registry: $DOCKER_REPOSITORY_URL (internal jobs)
    • Example path: docker.tool.reorc.cloud/<image_name>:<tags>
  • Public Docker Hub (public jobs)

    • Namespace: recurvedata/<image_name>:<tags>
  • Aliyun Container Registry (pushed in addition for all Docker build jobs)

    • Registry: reorc-registry-cn-registry-vpc.cn-shenzhen.cr.aliyuncs.com
    • Namespace mapping:
      • Non-production (test/staging): internal
      • Production: public
    • Full path: <registry>/<namespace>/<image_name>:<tags>

Login and authentication

Templates in .gitlab/ci/templates.yml:

  • .docker_internal_configuration: login to internal private registry and also login to Aliyun
  • .docker_public_configuration: login to Docker Hub and also login to Aliyun

Both templates execute docker logout || true once in before_script, then log in to the respective registry and Aliyun so a single job can push to multiple registries.

Required CI variables (examples)

Configure the following as Masked/Protected variables in GitLab CI/CD settings:

# Nexus / PyPI
NEXUS_USERNAME, NEXUS_PASSWORD, NEXUS_REPOSITORY_URL, NEXUS_PACKAGE_URL
PYPI_USERNAME, PYPI_PASSWORD, PYPI_REPOSITORY_URL (optional)

# Private registry (internal jobs)
DOCKER_REPOSITORY_URL, DOCKER_USERNAME, DOCKER_PASSWORD

# Docker Hub (public jobs)
DOCKER_OFFICIAL_REPOSITORY_URL, DOCKER_OFFICIAL_USERNAME, DOCKER_OFFICIAL_PASSWORD

# Aliyun registry (additional push in all Docker jobs)
ALIYUN_REGISTRY_URL=reorc-registry-cn-registry-vpc.cn-shenzhen.cr.aliyuncs.com
ALIYUN_USERNAME, ALIYUN_PASSWORD

# Namespaces declared in repo variables (adjust if needed)
ALIYUN_NAMESPACE_INTERNAL=internal
ALIYUN_NAMESPACE_PUBLIC=public

Manual triggers and verification

  • On the main branch, package_python_public and build_docker_public are manual; click “Play” in the GitLab Pipeline UI.
  • Verify images:
    • Internal: docker.tool.reorc.cloud/<image_name>:<tag>
    • Docker Hub: recurvedata/<image_name>:<tag>
    • Aliyun: reorc-registry-cn-registry-vpc.cn-shenzhen.cr.aliyuncs.com/<namespace>/<image_name>:<tag>

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

recurvedata_lib-0.1.561.tar.gz (444.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

recurvedata_lib-0.1.561-py2.py3-none-any.whl (631.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file recurvedata_lib-0.1.561.tar.gz.

File metadata

  • Download URL: recurvedata_lib-0.1.561.tar.gz
  • Upload date:
  • Size: 444.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for recurvedata_lib-0.1.561.tar.gz
Algorithm Hash digest
SHA256 214e8db34e6340d232f5a72fc633f30d8fbcc6cd54ecd7e1ad149c531ae9c5f9
MD5 2c053e8de9aa5c5d0e5779adf9bee119
BLAKE2b-256 6a1ef2c618b2bc4e5278a409a1e8f6a68c377cfc8b90814d670a0778c0305469

See more details on using hashes here.

File details

Details for the file recurvedata_lib-0.1.561-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for recurvedata_lib-0.1.561-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a78e582aad37205fefecaf7be09cfeb1e5a4d2921a3e145bca12af1e704bc5f2
MD5 9cc723f7fe6e0c1ed4ea8f5076f99f27
BLAKE2b-256 cbd651ec08b43e3c9daa344a57df86b9258f6fbe29001b1a6a877417a487c41c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page