Skip to main content

Python wrapper for the datamover tools that enable moving data between the ZIH fileserver and the taurus cluster at TU Dresden

Project description

Taurus Datamover

Python wrapper for the datamover tools that enable moving data between the ZIH fileserver and the taurus cluster at TU Dresden

Before you can start working on the taurus cluster, you need to file a project request. You also need a group share on the ZIH fileserver.

Getting started

If you own a project space on taurus and want to transfer files to it, you can ask HPC support to enable a connection to your fileserver.

Afterwards, you can use TUD ZIH datamover to transfer files from the fileserver to the cluster.

This python package provides a python wrapper for the datamover tools.

Usage

First import taurus_datamover and create a Datamover object

from taurus_datamover import Datamover, waitfor
dm = Datamover()

The object will have all datamover commands as methods. The most commonly used methods are:

  • dtls list directory contents. Equivalent of the the linux command ls.
  • dtcp copy files like cp
  • dtmv move files like mv
  • dtrm delete files like rm

All commands take the same arguments like the linux equivalents. Arguments are passed as arguments to the method. For example, the --help option tells you how to use them:

out, _ = dm.dtls('--help').communicate()
print(out.decode('utf-8'))

returns (truncated):

Usage: ls [OPTION]... [FILE]...
List information about the FILEs (the current directory by default).
Sort entries alphabetically if none of -cftuvSUX nor --sort is specified.

Mandatory arguments to long options are mandatory for short options too.
  -a, --all                  do not ignore entries starting with .
  -A, --almost-all           do not list implied . and ..
      --author               with -l, print the author of each file
[...]

The --help argument is a little special, because it returns immediately. Normally, the commands run on a separate node in the background.

For example, listing the contents of the directory /grp/g_biapol

proc = dm.dtls('-lh','/grp/g_biapol')

initially just prints some messages from the clusters queuing system:

srun: job 27549105 queued and waiting for resources

(some time passing here, until the job gets assigned to a node...)

srun: job 27549105 has been allocated resources
srun: error: ioctl(TIOCGWINSZ): Inappropriate ioctl for device
srun: error: Not using a pseudo-terminal, disregarding --pty option

the error messages are normal. they are caused by the fact that we did not run the command from an interactive terminal.

The whole process takes a while. therefore Datamover executes the commands in the background and returns immediately, so that you can do other stuff in the background. You can check the status with poll(). As long as that returns None, the process is still in progress:

proc.poll()
None

And if the process finished it will return the exit code (0 in case of success, an integer larger than 0 in case of an error):

proc.poll()
0

If your code needs to wait for the result, use the waitfor helper function, which waits for the result and returns the exit code of the command:

proc = dm.dtls('-lh','/grp/g_biapol')
waitfor(proc)
Waiting ..
srun: job 27549647 queued and waiting for resources
......................................................................................................................
srun: job 27549647 has been allocated resources
.
srun: error: ioctl(TIOCGWINSZ): Inappropriate ioctl for device
srun: error: Not using a pseudo-terminal, disregarding --pty option
.
0

Only error messages are printed immediately, the normal output of the function is captured and can be retrieved like this:

out, _ = proc.communicate()
print(out.decode('utf-8'))
total 12K
drwx------ 20 roha044c 1111111 4.0K Jul  8 13:59 data
drwx------  3 mazo260d 1111111 4.0K Jun 20 11:49 presentations
drwx------  3 johamuel 1111111 4.0K Mar 10 14:53 projects

Examples

Recursively (-r) copy the directory data/test from the fileserver to the project space:

waitfor(dm.dtcp('-r','/grp/g_biapol/data/test', '/projects/p_bioimage/'))

once it is done, we can see (and read) the files on the project space:

!ls -la /projects/p_bioimage/test/
total 12
drwxrwsr-x 3 tkorten p_bioimage 4096 Aug  3 16:31 .
drwxrws--T 6 root    p_bioimage 4096 Aug  1 13:52 ..
drwx--S--- 2 tkorten p_bioimage 4096 Aug  3 16:31 test

However, from a normal node, we don't have write access:

!rm -r /projects/p_bioimage/test/test
rm: cannot remove '/projects/p_bioimage/test/test': Read-only file system

But using the datamover, we can write to the project space:

waitfor(dm.dtrm('-r', '/projects/p_bioimage/test'))
!ls -la /projects/p_bioimage/test/
ls: cannot access '/projects/p_bioimage/test/': No such file or directory

Contributing

Contributions are very welcome. Tests can be run with tox, please ensure the coverage at least stays the same before you submit a pull request.

License

Distributed under the terms of the BSD-3 license, "biapol-taurus" is free and open source software.

Support

If you need support with the tools listed here, please open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taurus-datamover-0.0.16.tar.gz (10.1 kB view hashes)

Uploaded Source

Built Distribution

taurus_datamover-0.0.16-py3-none-any.whl (8.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page