Skip to main content

A package for low- and high-level high-bandwidth asynchronous data transfer

Project description

pyRemoteData

Python version License: MIT Tests codecov


pyRemoteData is a module developed for scientific computation using the remote storage platform ERDA (Electronic Research Data Archive) provided by Aarhus University IT, as part of my PhD at the Department of Ecoscience at Aarhus University.

It can be used with any storage facility that supports SFTP and LFTP, but is only tested on a minimal SFTP server found at atmoz/sftp and on the live AU ERDA service which runs on MiG (Minimum intrusion Grid - SourceForge/GitHub) developed by SCIENCE HPC Centre at Copenhagen University.

Capabilities

In order to facility high-throughput computation in a cross-platform setting, pyRemoteData handles data transfer with multithreading and asynchronous data streaming using thread-safe buffers.

Use-cases

If your storage facility supports SFTP and LFTP, and you need high-bandwidth data streaming for analysis, data migration or other purposes such as model-training, then this module may be of use to you. Experience with SFTP or LFTP is not necessary, but you must be able to setup the required SSH configurations.

See Automated for details on how to avoid having to set up SSH configuration.

Setup

A more user-friendly setup process, which facilitates both automated as well as interactive setup is currently in development. (TODO: Finish and describe the setup process)

Installation

The package is available on PyPI. The recommended way to install and manage dependencies is using the lightning-fast uv package manager:

# Add to your current project
uv add pyremotedata

Alternatively, you can use the uv pip interface or standard pip:

uv pip install pyremotedata
# or just
pip install pyremotedata

Interactive

Simply follow the popup instructions that appear once you load the package for the first time.

Automated

The automatic configuration setup relies on setting the correct environment variables BEFORE LOADING THE PACKAGE:

  • PYREMOTEDATA_REMOTE_USERNAME : Should be set to your username on your remote service.
  • PYREMOTEDATA_REMOTE_URI : Should be set to the URI of the endpoint for your remote service (e.g. for ERDA it is "io.erda.au.dk").
  • PYREMOTEDATA_REMOTE_DIRECTORY : If you would like to set a default working directory, that is not the root of your remote storage, then set this to that (e.g. "/MY_PROJECT/DATASETS") otherwise simply set this to "/".
  • PYREMOTEDATA_AUTO : Should be set to "yes" to disable interactive mode. If this is not set, or set to anything other than "yes" (not case-sensitive), while any of the prior environment variables are unset an error will be thrown.

The recommended way to avoid any SSH or environment variables setup is to use:

from pyremotedata.implicit_mount import IOHandler
with IOHandler(lftp_settings = {'sftp:connect-program' : 'ssh -a -x -i <keyfile>'}, user = <USER>, remote = <REMOTE>) as io:
    ...

Here keyfile is probably something like ~/.ssh/id_rsa.

Example

If you want to test against a mock server simply follow the instructions in tests/README.

If you have a remote storage facility that supports SFTP and LFTP, then you can use the following example to test the functionality of the module:

# Set the environment variables (only necessary in a non-interactive setting)
# If you are simply running this as a Python script, 
# you can omit these lines and you will be prompted to set them interactively
import os
os.environ["PYREMOTEDATA_REMOTE_USERNAME"] = "username"
os.environ["PYREMOTEDATA_REMOTE_URI"] = "storage.example.com"
os.environ["PYREMOTEDATA_REMOTE_DIRECTORY"] = "/MY_PROJECT/DATASETS"
os.environ["PYREMOTEDATA_AUTO"] = "yes"

from pyremotedata.implicit_mount import IOHandler

handler = IOHandler()

with handler as io:
    print(io.ls())
    local_file = io.download("/remote/file/or/directory")

# The configuration is persistent, but can be removed using the following:
from pyremotedata.config import remove_config
remove_config()

Issues

This module is certainly not maximally efficient, and you may run into network- or OS-specific issues. Any and all feedback and contributions is highly appreciated.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyremotedata-0.1.13.tar.gz (87.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyremotedata-0.1.13-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file pyremotedata-0.1.13.tar.gz.

File metadata

  • Download URL: pyremotedata-0.1.13.tar.gz
  • Upload date:
  • Size: 87.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyremotedata-0.1.13.tar.gz
Algorithm Hash digest
SHA256 c1c20ee9aa11630d5926eb67ba39200cd3f1fb8de2002d34f134474ef9229869
MD5 8ec84e8e6452bcffd6a3be8a34892845
BLAKE2b-256 3b05f3530ee8d90e3339699554c8eec707ad5264c1251802f075edfdb1cf3bb6

See more details on using hashes here.

File details

Details for the file pyremotedata-0.1.13-py3-none-any.whl.

File metadata

  • Download URL: pyremotedata-0.1.13-py3-none-any.whl
  • Upload date:
  • Size: 29.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for pyremotedata-0.1.13-py3-none-any.whl
Algorithm Hash digest
SHA256 488b2e33c6c28155e91cb98e27c87627a87c44d89ab945684d59b8d308fe8f33
MD5 d3c6cba153d2883b4c7f7791218abf93
BLAKE2b-256 b22a917b538610294ab56afca0fb9111930c0d1ffb6ee2b0d40cbc33db0193fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page