A tool to orchestrate rsyncing of data between multiple machines and instances of rsyncdirector
Project description
RsyncDirector
Automate and coordinate the running of rsync jobs on multiple Linux hosts with a cron syntax schedule.
Overview
rsyncdirector enables multiple rsync jobs, running locally or remotely, to synchronize on lock files, local or remote, to ensure that data is only mutated by one rsync process at a time. Arbitrary commands can also be interspersed between sync actions.
For example: Every 24 hours rsyncdirector runs on laptop01 backing up data to a backup01. Both machines are in the same physical location. Every 7 days an instance of rsyncdirector on backup01 rsyncs all of its data to backup-remote which is at a remote site. Before the instance of rsyncdirector on laptop01 starts syncing data to backup01 it checks to see if a lockfile exists in a predefined location on backup01. If it exists, it indicates that backup01 is currently syncing data and the laptop01 rsyncdirector waits a configured amount of time and then re-checks, until the lock file no longer exists, or a configured timeout exipres. Once the laptop01 rsyncdirector sees that the lockfile on backup01 is gone, it writes a lockfile to backup01 and starts rsyncing data from itself to backup01. In this way, multiple rsyncdirector instances can be run across multiple hosts and coordinate so that data is not mutated at the same time.
rsyncdirector runs as a systemd service via a template unit file to enable running multiple rsyncdirector instances on a single host.
rsyncdirector communicates to remote hosts over SSH using the Python Fabric library which depends on Paramiko for the core SSH protocol implementation. SSH connections are currently authenticated using passphrase-less SSH keys.
rsync functionality is implemented by the underlying rsync package installed on the host.
Each instance can be configured to run under any arbitrary user and connect to any remote host that it can reach with the supplied credentials. The user under which it runs must have read permissions for the files that it is configured to rsync. The administrator must distribute public SSH keys to the hosts to which data is to be rsynced and can optionally specify a private key other than the rsyncdirector user's default set of private keys.
rsyncdirector can also run arbitrary command actions interspersed between sync actions to enable dumping of databases, or any other commands to be run.
See the annotated, example config file for details on how to configure an instance of rsyncdirector.
There is a companion program, rsyncdirector_deploy, to assist in the installation and deployment of instances of rsyncdirector.
Forcing a Run Now
The rsyncdirector listens for SIGHUP events and when receives one will immediately schedule a run-once execution of the configured jobs.
-
Get the PID of the
rsyncdirectorprocess. If there are multiple instances of it running you will need to adjust the way you search for the PID.pgrep rsyncdirector -
Send the
SIGHUPkill -SIGHUP <PID>
Key Concepts
Much of the following concepts map to specific configuration parameters.
cron_schedule
Each rsyncdirector instance requires defining a peridocity for which it will run by defining a cron expression in the config which uses the same syntax as a standard Linux cron job.
job
Each job encapsulates n number of actions; either a sync or a command. The job defines the specifics for connecting to a remote host for sync actions if the job type is remote.
action
An action is defined within the scope of a job and is either a sync or command. ALL actions must success for a job to continue running. If one of the actions fail, the job is short-circuited and the next job is executed.
sync
A sync action defines the specific rsync command to be run. Each "inherits" the definitions of the job to enable the concatenation of the rsync command to include the specified user, host, and port information for remote jobs. The source and dest are self-explanatory, and the opts list enables the inclusion of any arbitrary options that the underlying rsync implementation on the host allows.
command
A command action defines a specific command to be run on the host on which rsyncdirector is running. It includes the command itself and an optional list of arguments.
lock_files
lock_files define an arbitrary number of files that will be created on either the localhost or remote host(s) to signal to other rsyncdirector instances that a given instance is running.
blocks_on
blocks_on define an arbitrary number of lock_files from other rsyncdirector instances that the job will wait on before continuing with the job.
Metrics
rsyncdirector exposes metrics via the prometheus_client via http://$host:$port/metrics. By default, it will listen on port 9090 and can be overridden.
Installing
There is a companion program, rsyncdirector_deploy, that contains all of the configuration templates and automation to assist in the installation and deployment of configurations of rsyncdirector.
The easiest way to install and configure it is with the aformentioned program. If you want to install it by hand, you will need to get the configuration templates from the rsyncdirector_deploy repo.
Building, Developing, and Testing
This requires a compatible version of Python in your path from which the path to the interpreter can be gleaned by running which python<version>. See the requires-python entry in pyproject.toml at the root of the repo for the currently required version of Python.
Once that is installed, run ./run-tests.sh --dev-setup to create a development virtual environment.
Building a Distribution
python -m build
twine check dist/*
Uploading to PyPi
-
Generate an API token and add the requisite
[pypi]entry to the~/.pypircfile[pypi] username = __token__ password = <API-token>
-
Then upload the artifacts to PyPi
python -m twine upload --repository pypi dist/*
Uploading to Nexus
In my case, I have an instance of Nexus running in my network with a ~/.pypirc file configured for it and publish the artifacts there with the following command:
twine upload --cert <path-to-nexus-cert>--verbose --repository pypi-[dev|release] dist/rsyncdirector-<version>.tar.gz
Running the Tests
The integration tests require that the user running the tests can ssh to root@localhost witout having to enter the password. To do so add a public ssh key of the user running the tests to the authorized keys of the root user on the localhost.
Then, from the root of the repo run
./run-tests.sh
Setting Up to Run and Debug Integration Tests in VSCode
Most of the requisite configs are already present in the provided .vscode/settings.json file.
-
Enable the following in your VSCode settings (Workspace or User settings)
"python.analysis.typeCheckingMode": "basic"
-
Export the path to the required version of Python that you want to use. If you already have one that will suffice, skip this step.
export RSYNCDIRECTORINTTEST_PYTHON=<path-to-python-binary>
-
Setup the test environment. The following will create a
.envfile at the root of the repository that VSCode will read while running the tests and build the test containers../run-tests.sh --dev-setup -
Configure VSCode
-
Press
CTRL+SHIFT+Pand select Python: Configure Tests -
Select
pytest. Even though we are using theunittestlibrary for testing this seems to be the only test configuration in VSCode that works for running and debugging the tests for the time being. -
Choose rsyncdirector as the root directory for the tests
-
Click on the Testing icon in the left-hand side-bar and you should see a list of all of the tests that you can now run or debug.
If you do not see any tests listed or see an error in the panel check the Output Panel and select Python from the dropdown menu for any relevant error messages.
-
-
Update env vars so that existing test timeouts don't result in Exceptions while you are stepping through the code: There are a number of env vars written to the
.envfile that determine timeouts for various waits in the test code. When running the whole test suite, the defaults in the.envfile are fine. When stepping through the code in the IDE these timeouts can expire and throw Exceptions when nothing has yet gone wrong and get in your way of debugging the code. To avoid this, edit the.envfile and update the following env vars setting to the following suggested values. If you re-runrun-tests.sh --dev-setupit will overwrite your changes to the.envfile.RSYNCDIRECTORINTTEST_WAITFOR_TIMEOUT_SECONDS=1000
Setting up to Develop in VSCode
- Install the Black Formatter extension from Microsoft.
- Open a Python file in VS Code.
- Right-click on the editor to display the context menu.
- Select Format Document With....
- Select Configure Default Formatter... from the drop-down menu.
- Select black as your preferred formatter extension from the list.
Dependency Management
First-order dependencies for requirements.txt, requirements_test.txt, are defined in the respective .in files. If you make updates to the first order dependencies you need to "compile" the full dependency list. First, ensure that pip-tools is installed in your dev virtual environment. dev dependendies are just defined in requirements_dev.txt.
pip install .[dev]
pip install .[test]
Then run the following
pip-compile -v --no-emit-trusted-host --no-emit-index-url requirements.in
pip-compile -v --no-emit-trusted-host --no-emit-index-url requirements_test.in
Vulnerability Scanning
If you have a Safety account you can run safety scan to check for any dependency vulnerabilities.
Notes
This is the solution for terminating a multi process/thread in Python. It was the solution for being able to kill a running rsync job. Actually run the rsync command in a separate process that I can then kill if we receive a SIGTERM for the parent process.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rsyncdirector-0.1.3.tar.gz.
File metadata
- Download URL: rsyncdirector-0.1.3.tar.gz
- Upload date:
- Size: 25.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aefbfa5db02072c022f9ae72080ae960788480fa11e96791f49bb57c4998d27e
|
|
| MD5 |
44e1648f796b2395f2409e3514397580
|
|
| BLAKE2b-256 |
727c5e4b9acea7b327cd5b7b968e25f71639cc69dbde481d1eaeb38c2b8ed3b1
|
File details
Details for the file rsyncdirector-0.1.3-py3-none-any.whl.
File metadata
- Download URL: rsyncdirector-0.1.3-py3-none-any.whl
- Upload date:
- Size: 23.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
892383c995fb845afc42415e4b340ba5000b092e8b1599e49f2f0d13dbc6735d
|
|
| MD5 |
3d894f9d7be03c932b151fd45f28485e
|
|
| BLAKE2b-256 |
94c28fbb87bd44fd088860694ebf573e7b2245f5b2d6300dddb7e0d5917c95cb
|