Automated data transfer system using rsync and cron
Project description
Landing Zones
Automated data transfer system using rsync with cron job generation.
Quick Start
# Install
pip install -e .
# Generate cron files, transfer scripts, and validation wrappers
landingzones --help
landingzones --config config/config.yaml build
landingzones build
# Check deployment readiness
landingzones validate deployment
# Run a hop-local validation
landingzones validate hop <flow_group> preflight
landingzones validate hop <flow_group>
# Run toy data through the configured flows
landingzones validate integration
Project Structure
landingzones/
├── src/landingzones/ # Main package
│ ├── cli.py # Top-level operator CLI
│ ├── generate_cron_files.py # Cron generation tool
│ ├── check_deployment_readiness.py
│ ├── plot_transfer_status.py
│ └── config/transfers.tsv # Default config
├── input/ # Default input directory
├── output/ # Default output directory
│ ├── crontab.d/ # Generated cron files
│ ├── scripts/ # Generated transfer scripts
│ └── validation_scripts/ # Generated validation wrappers
├── log/ # Default log directory
├── tests/ # Test suite
├── pyproject.toml # Package config
└── README.md
Configuration
The system is configured via a tab-separated transfers.tsv file:
| Column | Description | Example |
|---|---|---|
identifiers |
Unique transfer ID used for generated shell script names | transfer_001, server1_to_server2 |
runtime_id |
Required deploy/artifact identity used for cron grouping and filtering | server1_prod.user1 |
system |
Configured system key used for managed paths and flock settings | server1, localhost |
users |
Optional user/account context for review and generated headers | user1, local |
source |
Source directory path | /srv/data/src/ |
source_port |
SSH port for remote sources (optional) | 2222 |
destination |
Destination (local or remote) | user@host:/dest/ |
destination_port |
SSH port (optional) | 225 |
rsync_options |
Additional rsync flags | --chown=:group |
io_nice |
Optional ionice settings for rsync |
-c2 -n7 |
log_file |
Log file name resolved under the system log folder | transfers.log |
flock_file |
Lock file name resolved under the system flock folder | transfer.lock |
Future todo: add an optional second, per-remote-host lock for cross-server
transfers. The existing flock_file prevents one transfer from overlapping
with itself; a host-level lock would limit concurrent SSH/rsync handshakes
against the same remote server when many transfer rows run on the same cron
schedule.
Example
identifiers runtime_id system users source source_port destination destination_port rsync_options io_nice log_file flock_file
local_copy localhost_test.testuser localhost testuser input/* output/ transfers.log landingzones.lock
CLI Commands
# Generate cron files with defaults
landingzones build
# Generate only selected runtime IDs from a shared transfers.tsv
landingzones build --runtime-id server1_prod.user1 --runtime-id server2_prod.user2
# Check deployment readiness
landingzones validate deployment
# Run a hop-local validation wrapper through the CLI
landingzones validate hop <flow_group>
# Seed toy data and run the real scripts/logs/locks
landingzones validate integration
# Generate an HTML health dashboard from a shared transfer TSV log
landingzones report transfers output/log/Landing_Zone_server1_prod.user1.transfers.tsv
Generated Cron Format
*/15 * * * * /bin/sh output/scripts/local_copy.sh
Installation
# Development mode
pip install -e ".[report]"
# With test dependencies
pip install -e ".[test]"
# Production
pip install .
Lab Sequencer Bundle
For lab machines where a managed Python environment is awkward, build a
relocatable bundle using a python-build-standalone runtime. Build it on a
machine that matches the lab sequencer OS, architecture, and libc family.
Download or provide a python-build-standalone install_only archive, then run:
cd app
python scripts/build_python_standalone_bundle.py --python-archive /path/to/cpython-*-install_only.tar.*
If you already extracted the runtime, point at its Python executable instead:
python scripts/build_python_standalone_bundle.py --python-bin /path/to/python/install/bin/python3
With Pixi, the app includes a packaging task that downloads a matching
python-build-standalone runtime using getpybs:
cd app
pixi run build-standalone
Build the lab Linux artifact on Linux. A bundle built on macOS contains a macOS
Python runtime and will fail on the sequencer with cannot execute binary file.
Before copying a tarball to the lab host, verify the bundled runtime:
file packaging/dist/landingzones-standalone/python/bin/python3
packaging/dist/landingzones-standalone/python/bin/python3 -c "import platform; print(platform.system(), platform.machine())"
Expected for the current lab machines is Linux/x86_64.
The standalone bundle installs the core operator CLI without pandas, so it is
intended for build, validate, and deploy on locked-down lab machines.
landingzones report transfers remains a reporting extra and should run from
an environment with landingzones[report] installed.
The same bundle can be produced by the GitHub Actions workflow
Build Standalone Bundle. Run it manually from Actions, or push a v* tag.
It uploads landingzones-standalone-linux-x86_64 containing:
landingzones-standalone-linux-x86_64.tar.gz
For v* tags, the workflow also creates or updates the matching GitHub Release
and uploads landingzones-standalone-linux-x86_64.tar.gz as a release asset.
The bundle is written to:
app/packaging/dist/landingzones-standalone/
app/packaging/dist/landingzones-standalone.tar.gz
Copy the tarball to the lab machine, extract it, and run it like the normal CLI:
./landingzones --config config/config.yaml build
./landingzones --config config/config.yaml validate deployment
For offline builds, pass --wheelhouse /path/to/wheels so dependencies are
installed from local wheels. The legacy shell wrapper still works:
./scripts/build_python_standalone_bundle.sh --python-archive /path/to/cpython-*-install_only.tar.*
The bundle carries Python and Python packages only; the target machine still
needs system tools such as rsync, ssh, flock, curl, and cron.
Testing
# Run all tests
pytest
# Verbose
pytest -v
# With coverage
pytest --cov=landingzones --cov-report=html
# Specific test
pytest tests/test_generate_cron_files.py::TestClassName::test_method
Validation Modes
The operator-facing validation surface has three modes:
landingzones validate deploymentlandingzones validate hop <flow_group> [preflight|run]landingzones validate integration
landingzones validate integration is the heavier integration-style test mode. It copies toy data into the configured starting locations, generates the real shell scripts, and runs the transfers using the normal log and flock paths.
Use landingzones validate integration --slow when you want the harness to print the result of each completed step and wait for Enter before running the next one.
Generated transfer scripts create portable .landing_zones sidecars for every enabled transfer. flow_group is optional sidecar metadata: when a transfer mints a new sidecar the value may be blank, and downstream transfers preserve the value already stored in the sidecar.
Generated Validation Wrappers
Each flow_group with exactly one is_entry_point=TRUE row gets a generated wrapper in the configured validation-scripts directory:
output/validation_scripts/lz_run_validation_<flow_group>.sh
Use landingzones validate hop <flow_group> as the main interface. The generated wrapper remains available directly and bakes in:
- the entry directory for that flow
- the immediate next hop for preflight checks
- the default fixture directory under
test_data - the
flow_groupand producer labels used in theLZTEST_...folder name
Typical usage:
# Regenerate scripts after changing config/transfers
landingzones --config config/config.yaml build
# Check only the current hop structure and immediate next-hop access
landingzones validate hop local_labnet_to_server1_data preflight
# Inject a validation run with the baked-in defaults
landingzones validate hop local_labnet_to_server1_data
# Inject a validation run with an explicit token suffix
landingzones validate hop local_labnet_to_server1_data --token ABCD
# Direct wrapper execution still works if needed
./output/validation_scripts/lz_run_validation_local_labnet_to_server1_data.sh
Wrapper/CLI behavior:
- no action defaults to
run preflightchecks only the current hop plus its immediate next hop- options-only invocation such as
--token ABCDalso defaults torun
Use landingzones validate hop for lightweight producer-side validation. Use landingzones validate integration when you want the heavier integration test that seeds toy data and executes the full generated transfer chain.
Required config in your deployment config.yaml:
transfers_file: input/transfers.tsv
test_data: tests/toy_data/
validation_scripts_dir: output/validation_scripts/
rit_managed_locations:
test_local: tests/test_local
flock_paths:
test_local: /opt/homebrew/bin/flock
rit_managed_folder_structure:
log: output/log/
flock: output/flock/
sh_output: output/scripts/
crontabs: output/crontab.d/
Typical local fixture layout:
deploy/local/
├── config/config.yaml
├── input/transfers.tsv
├── tests/toy_data/
└── tests/test_local/
How to run it:
# Run from the deployment root that owns config/, input/, and tests/
cd deploy/local
# Generate deployment artifacts
landingzones build --config config/config.yaml
# Run the heavier integration test
landingzones --config config/config.yaml validate integration
What it does:
- Filters
transfers.tsvto the currentsystemanduser - Seeds each initial source root from
test_data - Generates scripts into the configured
sh_outputdirectory - Uses the configured
logandflockdirectories - Executes the scripts in transfer order
- Validates that the seeded top-level directories reached the terminal destinations
After a successful run it asks whether you want cleanup. Answer y to remove the propagated test directories plus generated log and lock artifacts so the next run starts from the initial state. Answer n to inspect the final tree and logs.
Deployment
- Configure
transfers.tsvwith your routes - Generate cron files:
landingzones build - Deploy:
cp output/crontab.d/*.cron ~/crontab.d/ cat ~/crontab.d/*.cron | crontab -
Or use automated deployment:
landingzones validate deployment
Development
# Make changes in src/landingzones/
# Run tests
pytest
# Test CLI
landingzones --help
Requirements
- Python >= 3.8
- PyYAML >= 5.0.0
- pandas >= 1.0.0 only for
landingzones report transfers/landingzones[report] - System: rsync, ssh, flock
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file landingzones-1.1.6.tar.gz.
File metadata
- Download URL: landingzones-1.1.6.tar.gz
- Upload date:
- Size: 98.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0561713ede532f8bfc3535921ac5fad89921faf110dbbd122b51de5eda74ea5
|
|
| MD5 |
153d9c0a6dac1d1643fa5a8b5e9982e9
|
|
| BLAKE2b-256 |
0e439a5aaf28f773fb09cb300a6a5e027af1b2bd3dfe27b33560495e7c7904a7
|
Provenance
The following attestation bundles were made for landingzones-1.1.6.tar.gz:
Publisher:
publish.yml on ssi-dk/landingzones
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
landingzones-1.1.6.tar.gz -
Subject digest:
b0561713ede532f8bfc3535921ac5fad89921faf110dbbd122b51de5eda74ea5 - Sigstore transparency entry: 1440354595
- Sigstore integration time:
-
Permalink:
ssi-dk/landingzones@f8069a37e2e67e7a8a725ece05c81ce2e701fa29 -
Branch / Tag:
refs/tags/v1.1.6 - Owner: https://github.com/ssi-dk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f8069a37e2e67e7a8a725ece05c81ce2e701fa29 -
Trigger Event:
push
-
Statement type:
File details
Details for the file landingzones-1.1.6-py3-none-any.whl.
File metadata
- Download URL: landingzones-1.1.6-py3-none-any.whl
- Upload date:
- Size: 71.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff8c9e63ef316e1ec9ca9bd110b82d00f8f10fbd5b79a20d14f74d53719e7bee
|
|
| MD5 |
77fc273ba57ac36fb2700bd108ec87cf
|
|
| BLAKE2b-256 |
afe2e7c156a3e81305baf4e2a24a5b31f6f7cfad35f7b36a9547aab5f1507c70
|
Provenance
The following attestation bundles were made for landingzones-1.1.6-py3-none-any.whl:
Publisher:
publish.yml on ssi-dk/landingzones
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
landingzones-1.1.6-py3-none-any.whl -
Subject digest:
ff8c9e63ef316e1ec9ca9bd110b82d00f8f10fbd5b79a20d14f74d53719e7bee - Sigstore transparency entry: 1440354640
- Sigstore integration time:
-
Permalink:
ssi-dk/landingzones@f8069a37e2e67e7a8a725ece05c81ce2e701fa29 -
Branch / Tag:
refs/tags/v1.1.6 - Owner: https://github.com/ssi-dk
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f8069a37e2e67e7a8a725ece05c81ce2e701fa29 -
Trigger Event:
push
-
Statement type: