A package for low- and high-level high-bandwidth asynchronous data transfer
Project description
pyRemoteData
pyRemoteData
is a module developed for scientific computation using the remote storage platform ERDA (Electronic Research Data Archive) provided by Aarhus University IT, as part of my PhD at the Department of Ecoscience at Aarhus University.
It can be used with any passwordless SSH-enabled storage facility that supports SFTP and LFTP. But is only tested on a minimal SFTP server found at atmoz/sftp and on the live AU ERDA service which runs on MiG (Minimum intrusion Grid - SourceForge/GitHub) developed by SCIENCE HPC Centre at Copenhagen University.
If your facility requires a password, it should be very easy to modify the code to support this, in fact it is already implemented, but not exposed to the user. Merely change line 76 in src/remote_data/implicit_mount.py to fetch the password from the environment variable of your choice, or simply hardcode it. However, do this at your own risk, as I have not assessed the security implications.
Capabilities
In order to facility high-throughput computation in a cross-platform setting, pyRemoteData
handles data transfer with multithreading and asynchronous data streaming using thread-safe buffers.
Use-cases
If your storage facility supports SFTP and LFTP, and you need high-bandwidth data streaming for analysis, data migration or other purposes such as model-training, then this module may be of use to you. Experience with SFTP or LFTP is not necessary, but you must be able to setup the required SSH configurations.
Setup
A more user-friendly setup process, which facilitates both automated as well as interactive setup is currently in development. (TODO: Finish and describe the setup process)
Installation
The package is available on PyPI, and can be installed using pip:
pip install pyremotedata
Interactive
Simply follow the popup instructions that appear once you load the package for the first time.
Automated
The automatic configuration setup relies on setting the correct environment variables BEFORE LOADING THE PACKAGE:
PYREMOTEDATA_REMOTE_USERNAME
: Should be set to your username on your remote service.PYREMOTEDATA_REMOTE_URI
: Should be set to the URI of the endpoint for your remote service (e.g. for ERDA it is "io.erda.au.dk").PYREMOTEDATA_REMOTE_DIRECTORY
: If you would like to set a default working directory, that is not the root of your remote storage, then set this to that (e.g. "/MY_PROJECT/DATASETS") otherwise simply set this to "/".PYREMOTEDATA_AUTO
: Should be set to "yes" to disable interactive mode. If this is not set, or set to anything other than "yes" (not case-sensitive), while any of the prior environment variables are unset an error will be thrown.
Example
If you want to test against a mock server simply follow the instructions in tests/README.
If you have a remote storage facility that supports SFTP and LFTP, then you can use the following example to test the functionality of the module:
# Set the environment variables (only necessary in a non-interactive setting)
# If you are simply running this as a Python script,
# you can omit these lines and you will be prompted to set them interactively
import os
os.environ["PYREMOTEDATA_REMOTE_USERNAME"] = "username"
os.environ["PYREMOTEDATA_REMOTE_URI"] = "storage.example.com"
os.environ["PYREMOTEDATA_REMOTE_DIRECTORY"] = "/MY_PROJECT/DATASETS"
os.environ["PYREMOTEDATA_AUTO"] = "yes"
from pyremotedata.implicit_mount import IOHandler
handler = IOHandler()
with handler as io:
print(io.ls())
# The configuration is persistent, but can be removed using the following:
from pyremotedata.config import remove_config
remove_config()
Issues
This module is certainly not maximally efficient, and you may run into network- or OS-specific issues. Any and all feedback and contributions is highly appreciated.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyremotedata-0.0.14.tar.gz
.
File metadata
- Download URL: pyremotedata-0.0.14.tar.gz
- Upload date:
- Size: 25.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81a9c751f2b3f84da1f1ab837e149d3651117a28293ab79dfc9918d147d3b49f |
|
MD5 | 54f5cdac8e0a9bca4b17013067d4eeef |
|
BLAKE2b-256 | 8e8b9e940259fcd440e42902a1f83ef92033a3b9d965d85152b74a32b9a974ad |
File details
Details for the file pyremotedata-0.0.14-py3-none-any.whl
.
File metadata
- Download URL: pyremotedata-0.0.14-py3-none-any.whl
- Upload date:
- Size: 23.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91b31da1fdfdced94e2bb4223c328200c5fffe7cdf2123ddbadb7f4ee71b9862 |
|
MD5 | 47baa851d88bdf1ec61b06082022aba2 |
|
BLAKE2b-256 | 73662d48517f5f00a8a58450343ff41dba4d738d74bb77f18689c01f03613fe6 |