Common Library for ReOrc Data Platform
Project description
Recurve Libraries
For the unified maintenance of public components of the Recurve platform, these codes may be used in both Server and Executor (Worker) environments.
Only Python 3.11+ are supported.
Components
This code repository consists of the following core components:
Core
The foundation of the Recurve platform that provides:
- Base classes and interfaces for platform components
- Jinja2 templating engine integration
- Core configuration management
- Common platform abstractions
Utils
A comprehensive utility library offering:
- Time handling and date manipulation
- Concurrent processing tools
- File system operations and path handling
- String manipulation and text processing
- Logging and error handling utilities
- Data validation helpers
Connectors
A robust data connectivity layer supporting:
- Database connections (MySQL, PostgreSQL, Redshift, BigQuery, etc.)
- Cloud storage (S3, GCS, Azure Blob Storage)
- Messaging services and APIs
- Custom connector development framework
Note: Run make update-connector-schema after updating connector config schemas to regenerate config_schema.py
Schedulers
Airflow integration components including:
- Custom Airflow operators and sensors
- DAG generation utilities
- Workflow scheduling interfaces
- Task dependency management
Operators
Task-specific operators for:
- Data extraction and loading
- Data transformation and processing
- Running Python code
- Running SQL code
- Building and running DBT jobs
Client
A flexible client interface providing:
- Platform API abstractions
- Authentication handling
- Resource management
- Extensible base classes for custom clients
- Connection pooling and retry logic
Executors
Core job execution engine that:
- Manages job submissions and execution flows
- Orchestrates task execution on infrastructure
- Handles job lifecycle and state management
- Provides infrastructure abstraction layer
Development Workflow
Requirements management
We use uv to manage Python package dependencies. The workflow is:
-
Update source requirements in
.infiles:requirements.in- All dependenciesrequirements/worker.in- Worker-specific dependenciesrequirements/dbt.in- DBT-specific dependenciesrequirements-dev.in- Development dependencies
-
Compile locked requirements:
make compile-requirements # Compiles all requirements files
Or compile individual files:
make compile-worker # Just worker requirements make compile-dbt # Just DBT requirements
-
After compiling requirements, update optional dependencies in pyproject.toml:
make update-optional-deps
This ensures consistent dependencies across development and production environments.
Release Process
- Update version number in
recurvedata/__version__.py - Build and publish package:
make publishThis will clean build artifacts, build new package, and publish to Recurve PyPI.
Available Commands
The following make commands are available:
Build and Publishing:
make clean- Remove build artifacts (dist directory)make build- Clean and build the packagemake publish- Build and publish package to Recurve PyPI
Requirements Management:
make upgrade-uv- Upgrade the uv package installermake compile-worker- Compile worker-specific requirementsmake compile-dbt- Compile DBT-specific requirementsmake compile-requirements- Compile all requirements files and sync environmentmake install-requirements- Install requirements files
Maintenance Scripts:
make update-optional-deps- Update optional dependencies in pyproject.tomlmake update-connector-schema- Update connector configuration schemas
GitLab CI/CD Pipeline
Overview
This repository uses GitLab CI for build and release. The primary stages (in order) are:
- python_internal: build and publish internal PyPI package (develop)
- python_public: build and publish public PyPI package (main, manual)
- copy_dockerfiles: prepare Docker build context and version metadata
- docker_internal: build and push internal Docker images (develop / release/*)
- docker_official: build and push public Docker images (main, manual)
- scan: SAST/Sonar (disabled by default)
Configuration files:
.gitlab-ci.yml(stages, rules, job orchestration).gitlab/ci/templates.yml(shared templates and login configuration).gitlab/ci/variables.yml(shared variables)dockerfiles/build_push_images.sh(image build and push script)
Branch rules and triggers
-
develop branch
- Runs:
check_version→package_python_internal→copy_dockerfiles→build_docker_internal - Environment:
ENVIRONMENT=test - Targets: internal private registry + Aliyun
internalnamespace
- Runs:
-
release/* branches
- Runs:
copy_dockerfiles→build_docker_internal - Environment:
ENVIRONMENT=staging - Targets: internal private registry + Aliyun
internalnamespace
- Runs:
-
main branch
- Runs (manual):
package_python_public,build_docker_public,tag - Environment:
ENVIRONMENT=production - Targets: Docker Hub public registry + Aliyun
publicnamespace
- Runs (manual):
Note: the tag job creates an annotated tag after the public image build, including deployment metadata.
Image build and naming
Build script: dockerfiles/build_push_images.sh
- Image names:
- Production:
recurve-<service>(current service = worker) - Non-production:
recurve-<service>-<environment>(e.g.,recurve-worker-test,recurve-worker-staging)
- Production:
- Tags:
- version:
${VERSION_PACKAGE}(fromrecurvedata/__version__.py) latest
- version:
Push targets
-
Internal private registry (login handled by job type)
- Registry:
$DOCKER_REPOSITORY_URL(internal jobs) - Example path:
docker.tool.reorc.cloud/<image_name>:<tags>
- Registry:
-
Public Docker Hub (public jobs)
- Namespace:
recurvedata/<image_name>:<tags>
- Namespace:
-
Aliyun Container Registry (pushed in addition for all Docker build jobs)
- Registry:
reorc-registry-cn-registry-vpc.cn-shenzhen.cr.aliyuncs.com - Namespace mapping:
- Non-production (test/staging):
internal - Production:
public
- Non-production (test/staging):
- Full path:
<registry>/<namespace>/<image_name>:<tags>
- Registry:
Login and authentication
Templates in .gitlab/ci/templates.yml:
.docker_internal_configuration: login to internal private registry and also login to Aliyun.docker_public_configuration: login to Docker Hub and also login to Aliyun
Both templates execute docker logout || true once in before_script, then log in to the respective registry and Aliyun so a single job can push to multiple registries.
Required CI variables (examples)
Configure the following as Masked/Protected variables in GitLab CI/CD settings:
# Nexus / PyPI
NEXUS_USERNAME, NEXUS_PASSWORD, NEXUS_REPOSITORY_URL, NEXUS_PACKAGE_URL
PYPI_USERNAME, PYPI_PASSWORD, PYPI_REPOSITORY_URL (optional)
# Private registry (internal jobs)
DOCKER_REPOSITORY_URL, DOCKER_USERNAME, DOCKER_PASSWORD
# Docker Hub (public jobs)
DOCKER_OFFICIAL_REPOSITORY_URL, DOCKER_OFFICIAL_USERNAME, DOCKER_OFFICIAL_PASSWORD
# Aliyun registry (additional push in all Docker jobs)
ALIYUN_REGISTRY_URL=reorc-registry-cn-registry-vpc.cn-shenzhen.cr.aliyuncs.com
ALIYUN_USERNAME, ALIYUN_PASSWORD
# Namespaces declared in repo variables (adjust if needed)
ALIYUN_NAMESPACE_INTERNAL=internal
ALIYUN_NAMESPACE_PUBLIC=public
Manual triggers and verification
- On the main branch,
package_python_publicandbuild_docker_publicare manual; click “Play” in the GitLab Pipeline UI. - Verify images:
- Internal:
docker.tool.reorc.cloud/<image_name>:<tag> - Docker Hub:
recurvedata/<image_name>:<tag> - Aliyun:
reorc-registry-cn-registry-vpc.cn-shenzhen.cr.aliyuncs.com/<namespace>/<image_name>:<tag>
- Internal:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file recurvedata_lib-0.1.561.tar.gz.
File metadata
- Download URL: recurvedata_lib-0.1.561.tar.gz
- Upload date:
- Size: 444.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
214e8db34e6340d232f5a72fc633f30d8fbcc6cd54ecd7e1ad149c531ae9c5f9
|
|
| MD5 |
2c053e8de9aa5c5d0e5779adf9bee119
|
|
| BLAKE2b-256 |
6a1ef2c618b2bc4e5278a409a1e8f6a68c377cfc8b90814d670a0778c0305469
|
File details
Details for the file recurvedata_lib-0.1.561-py2.py3-none-any.whl.
File metadata
- Download URL: recurvedata_lib-0.1.561-py2.py3-none-any.whl
- Upload date:
- Size: 631.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a78e582aad37205fefecaf7be09cfeb1e5a4d2921a3e145bca12af1e704bc5f2
|
|
| MD5 |
9cc723f7fe6e0c1ed4ea8f5076f99f27
|
|
| BLAKE2b-256 |
cbd651ec08b43e3c9daa344a57df86b9258f6fbe29001b1a6a877417a487c41c
|