A one-eyed tool to copy files with.
Project description
oeleo
Python package / app that can be used for transferring files from an instrument-PC to a data server.
Features (or limitations)
- Transferring using an ssh connection should preferably be used with key-pairs. This might involve some
setting up on your server (ACL) to prevent security issues (the
oeleouser should only have access to the data folder on your server). - Accessing ssh can be done using password if you are not able to figure out how to set proper ownerships on your server.
oeleois one-eyed. Meaning that tracking of the "state of the duplicates" is only performed on the local side (whereoeleois running).- However,
oeleocontains acheckmethod that can help you figure out if starting copying is a
good idea or not. And populate the database if you want. - The db that stores information about the "state of the duplicates" is stored relative to the folder
oeleois running from. If you delete it (by accident?),oeleowill make a new empty one from scratch next time you run. - Configuration is done using environmental variables.
Usage
Install
$ pip install oeleo
Run
- Create an
oeleoworker instance. - Connect the worker's
bookkeeperto asqlite3database. - Filter local files.
- Run to copy files.
- Repeat from step 3.
Examples and descriptions
Simple script for copying between local folders
import os
from pathlib import Path
import time
import dotenv
from oeleo.checkers import ChecksumChecker
from oeleo.models import SimpleDbHandler
from oeleo.connectors import LocalConnector
from oeleo.workers import Worker
from oeleo.utils import start_logger
def main():
log = start_logger()
# assuming you have made a .env file:
dotenv.load_dotenv()
db_name = os.environ["OELEO_DB_NAME"]
base_directory_from = Path(os.environ["OELEO_BASE_DIR_FROM"])
base_directory_to = Path(os.environ["OELEO_BASE_DIR_TO"])
filter_extension = os.environ["OELEO_FILTER_EXTENSION"]
# Making a worker using the Worker class.
# You can also use the `factory` functions in `oeleo.worker`
# (e.g. `ssh_worker` and `simple_worker`)
bookkeeper = SimpleDbHandler(db_name)
checker = ChecksumChecker()
local_connector = LocalConnector(directory=base_directory_from)
external_connector = LocalConnector(directory=base_directory_to)
worker = Worker(
checker=checker,
local_connector=local_connector,
external_connector=external_connector,
bookkeeper=bookkeeper,
extension=filter_extension
)
# Running the worker with 5 minutes intervals.
# You can also use an oeleo scheduler for this.
worker.connect_to_db()
while True:
worker.filter_local()
worker.run()
time.sleep(300)
if __name__ == "__main__":
main()
Environment .env file
OELEO_BASE_DIR_FROM=C:\data\local
OELEO_BASE_DIR_TO=C:\data\pub
OELEO_FILTER_EXTENSION=.csv
OELEO_DB_NAME=local2pub.db
OELEO_LOG_DIR=C:\oeleo\logs
## only needed for advanced connectors:
# OELEO_DB_HOST=<db host>
# OELEO_DB_PORT=<db port>
# OELEO_DB_USER=<db user>
# OELEO_DB_PASSWORD=<db user>
# OELEO_EXTERNAL_HOST=<ssh hostname>
# OELEO_USERNAME=<ssh username>
# OELEO_PASSWORD=<ssh password>
# OELEO_KEY_FILENAME=<ssh key-pair filename>
## only needed for SharePointConnector:
# OELEO_SHAREPOINT_USERNAME=<sharepoint username (fallbacks to ssh username if missing)>
# OELEO_SHAREPOINT_URL=<url to sharepoint>
# OELEO_SHAREPOINT_SITENAME=<name of sharepoint site>
# OELEO_SHAREPOINT_DOC_LIBRARY=<name of sharepoint library>
Environment variables reference
Core transfer settings:
OELEO_BASE_DIR_FROM: local source directory.OELEO_BASE_DIR_TO: destination directory (local or remote, depending on connector).OELEO_FILTER_EXTENSION: file extension filter (include the dot, e.g..csv).OELEO_DB_NAME: sqlite database filename used for bookkeeping.OELEO_LOG_DIR: directory for log files; defaults to the current working directory.
SSH connector settings:
OELEO_EXTERNAL_HOST: SSH host (optionally with port, e.g.host:2222).OELEO_USERNAME: SSH username.OELEO_PASSWORD: SSH password (used when connecting with password).OELEO_KEY_FILENAME: SSH private key path (used when connecting with key-pair).
SharePoint connector settings:
OELEO_SHAREPOINT_URL: SharePoint base URL (e.g.https://yourcompany.sharepoint.com).OELEO_SHAREPOINT_SITENAME: SharePoint site name.OELEO_SHAREPOINT_DOC_LIBRARY: SharePoint document library name.OELEO_SHAREPOINT_USERNAME: SharePoint username; falls back toOELEO_USERNAMEif unset.
App settings (app/oa.pyw):
OA_SINGLE_RUN: run once and exit whentrue.OA_ADD_CHECK: run the check step before copying whentrue.OA_MAX_RUN_INTERVALS: number of scheduler runs before stopping.OA_HOURS_SLEEP: hours to sleep between runs.OA_FROM_YEAR: filter out files older than this year.OA_FROM_MONTH: filter out files older than this month.OA_FROM_DAY: filter out files older than this day.OA_STARTS_WITH: only include files starting with any of these prefixes; delimit with;.OA_INCLUDE_SUBDIRS: include subdirectories in local search whentrue.OA_EXTERNAL_SUBDIRS: include subdirectories on the destination whentrue.
Database
The database contains one table called filelist:
| id | processed_date | local_name | external_name | checksum | code |
|---|---|---|---|---|---|
| 1 | 2022-07-05 15:55:02.521154 | file_number_1.xyz | C:\oeleo\check\to\file_number_1.xyz | c976e564825667d7c11ba200457af263 | 1 |
| 2 | 2022-07-05 15:55:02.536152 | file_number_10.xyz | C:\oeleo\check\to\file_number_10.xyz | d502512c0d32d7503feb3fd3dd287376 | 1 |
| 3 | 2022-07-05 15:55:02.553157 | file_number_2.xyz | C:\oeleo\check\to\file_number_2.xyz | cb89d576f5bd57566c78247892baffa3 | 1 |
The processed_date is when the file was last updated (meaning last time oeleo found a new checksum for it).
The table below shows what the different values of code mean:
| code | meaning |
|---|---|
| 0 | should-be-copied |
| 1 | should-be-copied-if-changed |
| 2 | should-not-be-copied |
Hint! You can lock (chose to never copy) a file by editing the code manually to 2.
Using an oeleo scheduler
Instead of for example using a while loop to keep oeleo running continuously or at selected intervals,
you can use a scheduler (e.g. rocketry, watchdog, schedule, or more advanced options such as AirFlow).
oeleo also includes its own schedulers. This is an example of how to use the oeleo.SimpleScheduler:
import dotenv
from oeleo.schedulers import SimpleScheduler
from oeleo.workers import simple_worker
# assuming you have created an appropriate .env file
dotenv.load_dotenv()
worker = simple_worker()
s = SimpleScheduler(
worker,
run_interval_time=4, # seconds
max_run_intervals=4,
)
s.start()
Copy files from a Windows PC to a Linux server through ssh
import logging
import os
from pathlib import Path
import dotenv
from oeleo.connectors import register_password
from oeleo.utils import start_logger
from oeleo.workers import ssh_worker
log = start_logger()
print(" ssh ".center(80, "-"))
log.setLevel(logging.DEBUG)
log.info(f"Starting oeleo!")
dotenv.load_dotenv()
external_dir = "/srv/data"
filter_extension = ".res"
register_password(os.environ["OELEO_PASSWORD"])
worker = ssh_worker(
db_name="ssh_to_server.db",
base_directory_from=Path(r"data\raw"),
base_directory_to=external_dir,
extension=filter_extension,
)
worker.connect_to_db()
try:
worker.check(update_db=True)
worker.filter_local()
worker.run()
finally:
worker.close()
Future planned improvements
Just plans, no promises given.
- make even nicer printing and logging.
- create CLI.
- create an executable (partly done -- see the app folder).
- create a web-app.
- create a GUI (not likely).
Status
- Works on my PC → PC
- Works on my PC → my server
- Works on my server → my server
- Works on my instrument PC → my instrument PC
- Works on my instrument PC → my server
- Works OK
- Deployable
- On testpypi
- On pypi
- Code understandable for others
- Looking good
- Fairly easy to use
- Easy to use
- Easy to debug runs (e.g. editing sql)
Licence
MIT
Hints
You can silence the log output from paramiko (and Fabric/Invoke) like this:
import logging
for name in ("paramiko", "fabric", "invoke"):
logging.getLogger(name).setLevel(logging.WARNING)
Development
- Developed using
uvonpython 3.11. - For version
0.6and newer, Python3.8support is no longer required.
Some useful commands
Update version
# update version in pyproject.toml, e.g. from 0.5.3 to 0.6.0
Testing
Unit tests:
uv run pytest
SSH integration tests (requires a local SSH server):
uv run pytest -m ssh
See tests/README.md for Docker setup, environment variables, and helper scripts.
Build
python -m build
Publish
If you are using 2-factor authentication, you need to create a token on pypi.org and run:
python -m twine upload -u __token__ -p <token> dist/*
Next
- Improve logging
Development lead
- Jan Petter Maehlen, IFE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oeleo-0.7.0.tar.gz.
File metadata
- Download URL: oeleo-0.7.0.tar.gz
- Upload date:
- Size: 30.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed2a240a9680a3b389b2863290669ce5265364ae735529eb41f8dc05483457c3
|
|
| MD5 |
41c4073dcf6fc102fa660db4c432210a
|
|
| BLAKE2b-256 |
f661e825a895fe632949904723f738e543fda2530ebbd4a9d0dd7dfc7fd062be
|
File details
Details for the file oeleo-0.7.0-py3-none-any.whl.
File metadata
- Download URL: oeleo-0.7.0-py3-none-any.whl
- Upload date:
- Size: 27.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06f9e4f9a1088cd04243408fbbb800ba0650f8014c7c9585cd7fdb93b53292c0
|
|
| MD5 |
9e4caf5c9c36643ae275c0252733c922
|
|
| BLAKE2b-256 |
6eb81d4df768848ec7d5089855b987ad597f86e8ac1197d3eb15b02f9d349d2e
|