Simple dataset management
Project description
shelephant
Command-line arguments with a memory (stored in YAML-files).
Documentation: https://shelephant.readthedocs.io
Contents
Overview
Hallmark feature: Copy with restart
shelephant presents you with a way to copy files (from a remote, using SSH) in two steps:
- Collect a list of files that should be copied in a YAML-file, allowing you to review and customise the copy operation (e.g. by changing the order and making last-minute manual changes).
- Perform the copy, efficiently skipping files that are identical.
Typical workflow:
# Collect files to copy & compute their checksum (e.g. on remote system)
# - creates "shelephant_dump.yaml"
shelephant_dump *.hdf5
# - reads "shelephant_dump.yaml"
# - creates "shelephant_checksum.yaml"
shelephant_checksum
# Combine all needed info (locally)
# - reads "shelephant_dump.yaml" and "shelephant_checksum.yaml"
# - creates "shelephant_hostinfo.yaml"
shelephant_hostinfo --host myhost --prefix /some/path --files --checksum
# Copy from remote (can be restarted and any time, existing files are skipped)
# - reads "shelephant_hostinfo.yaml"
shelephant_get
- The filenames can be customised.
- To copy to a remote system use
shelephant_send
.- Get details in the help of the respective commands, e.g.
shelephant_dump --help
.- shelephant works for both local as remote copy actions.
Command-line tools
File information
shelephant_dump
: list filenames in a YAML file.shelephant_checksum
: get the checksums of files listed in a YAML file.shelephant_hostinfo
: collect host information (from a remote system).
File operations
shelephant_get
: copy from remote, based on earlier stored information.shelephant_send
: copy to remote, based on earlier stored information.shelephant_rm
: remove files listed in a YAML file.shelephant_cp
: copy files listed in a YAML file.shelephant_mv
: move files listed in a YAML file.
YAML file operations
shelephant_extract
: isolate a (number of) field(s) in a (new) YAML file.shelephant_merge
: merge two YAML-files.shelephant_parse
: parse a YAML-files and print to screen.
Disclaimer
This library is free to use under the MIT license. Any additions are very much appreciated, in terms of suggested functionality, code, documentation, testimonials, word-of-mouth advertisement, etc. Bug reports or feature requests can be filed on GitHub. As always, the code comes with no guarantee. None of the developers can be held responsible for possible mistakes.
Download: .zip file | .tar.gz file.
(c - MIT) T.W.J. de Geus (Tom) | tom@geus.me | www.geus.me | github.com/tdegeus/shelephant
Getting shelephant
Using conda
conda install -c conda-forge shelephant
This will also download and install all necessary dependencies.
Using PyPi
pip install shelephant
This will also download and install the necessary Python modules.
From source
# Download shelephant
git checkout https://github.com/tdegeus/shelephant.git
cd shelephant
# Install
python -m pip install .
This will also download and install the necessary Python modules.
Detailed examples
Get files from remote, allowing restarts
Suppose that we want to copy all *.txt
files
from a certain directory /path/where/files/are/stored
on a remote host hostname
.
First step, collect information on the host:
# connect to the host
ssh hostname
# go the relevant location at the host
cd "/path/where/files/are/stored/on/remote"
# list files to copy
shelephant_dump -o files_to_copy.yaml *.txt
# optional but useful, get the checksum of the files to copy
shelephant_checksum -o files_checksum.yaml files_to_copy.yaml
# disconnect
exit # or press Ctrl + D
Second step, copy files to the local system, collecting everything in a single place:
# go to the relevant location on the local system
# (often this is new directory)
cd "/path/where/to/copy/to"
# get the file-information compiled on the host
# and store in a (temporary) local file
# note that all paths are on the remote system,
# and that they are now copied using secure-copy (scp)
shelephant_hostinfo \
-o remote_info.yaml \
--host "hostname" \
--prefix "/path/where/files/are/stored/on/remote" \
--files "files_to_copy.yaml " \
--checksum "files_checksum.yaml"
# finally, get the files using secure copy
# (the files are stored relative to the path of 'remote_info.yaml',
# identically to how they are relative to 'files_to_copy.yaml' on remote)
shelephant_get remote_info.yaml
If you use the default filenames for
shelephant_dump
(shelephant_dump.yaml
) andshelephant_checksum
(shelephant_checksum.yaml
) remotely, you can also specify--files
and--checksum
without an argument.
An interesting benefit that derives from having computed the checksums on the host,
is that shelephant_get
can be stopped and restarted:
only files that do not exist locally, or that were only partially copied
(whose checksum does not match the remotely computed checksum), will be copied;
all fully copied files will be skipped.
Let's further illustrate with a complete example. On the host, suppose that we have
/path/where/files/are/stored/on/remote
- foo.txt
- bar.txt
This will give, files_to_copy.yaml
:
- foo.txt
- bar.txt
files_checksum.yaml
(for example):
- 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae
- fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9
This information will be collected to remote_info.yaml
host: hostname
root: /path/where/files/are/stored/on/remote
files:
- foo.txt
- bar.txt
checksum:
- 2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae
- fcde2b2edba56bf408601fb721fe9b5c338d10ee429ea04fae5511b68fbf8fb9
shelephant_get
will now copy foo.txt
and bar.txt
relative to the directory of
remote_info.yaml
(in this case in the same folder as remote_info.yaml
).
It will skip any files whose filename and checksum match to target ones.
Avoid recomputing checksums
Suppose that we want to restart multiple times, or that we update the files present on the remote after copying them initially. In that case, we can use previously computed checksums to avoid recomputing them (which can be costly for large files).
First step, update information on the host:
# connect to the host
ssh hostname
# go the relevant location at the host
cd "/path/where/files/are/stored/on/remote"
# collect the previously computed information
shelephant_hostinfo -o precomputed_checksums.yaml -f files_to_copy.yaml -c files_checksum.yaml
# list files to copy
shelephant_dump -o files_to_copy.yaml *.txt
# get the checksum of the files to copy, where possible reading precomputed values
shelephant_checksum -o files_checksum.yaml files_to_copy.yaml -l precomputed_checksums.yaml
# disconnect
exit # or press Ctrl + D
Second step, copy files to the local system, collecting everything in a single place:
# go to the relevant location on the local system
# (often this is new directory)
cd "/path/where/to/copy/to"
# collect the previously computed information
shelephant_hostinfo -o precomputed_checksums.yaml -f files_present.yaml -c files_checksum.yaml
# list files currently present locally
shelephant_dump -o files_present.yaml *.txt
# get the checksum of the files to copy, where possible reading precomputed values
shelephant_checksum -o files_checksum.yaml files_present.yaml -l precomputed_checksums.yaml
# combine local files and checksums
shelephant_hostinfo -o precomputed_checksums.yaml -f files_present.yaml -c files_checksum.yaml
# get the file-information compiled on the host [as before]
shelephant_hostinfo \
-o remote_info.yaml \
--host "hostname" \
--prefix "/path/where/files/are/stored/on/remote" \
--files "files_to_copy.yaml " \
--checksum "files_checksum.yaml"
# get the files using secure copy
# use the precomputed checksums instead of computing them
shelephant_get remote_info.yaml --local "precomputed_checksums.yaml"
Send files to host
Basic copy
Suppose that we want to copy all *.txt
files
from a certain local directory /path/where/files/are/stored/locally
,
to a remote host hostname
.
First, we will collect information locally:
# go the relevant location (locally)
cd /path/where/files/are/stored/locally
# list files to copy
shelephant_dump -o files_to_copy.yaml *.txt
Then, we will specify some basic information about the host
# specify basic information about the host
# and store in a (temporary) local file
shelephant_hostinfo \
-o remote_info.yaml \
--host "hostname" \
--prefix "/path/where/to/copy/to/on/remote" \
Now we can copy the files:
shelephant_send files_to_copy.yaml remote_info.yaml
Restart
Suppose that copying was interrupted before completing. We can avoid recopying by again using the checksums. We therefore need to know which files are already present remotely and which checksum they have. Thereto:
# connect to the host
ssh hostname
# go the relevant location at the host
cd "/path/where/to/copy/to/on/remote"
# list files to copy
shelephant_dump -o files_to_copy.yaml *.txt
# get the checksum of the files to copy
shelephant_checksum -o files_checksum.yaml files_to_copy.yaml
# disconnect
exit # or press Ctrl + D
Now we will complement the basic host-info:
shelephant_hostinfo \
-o remote_info.yaml \
--host "hostname" \
--prefix "/path/where/to/copy/to/on/remote" \
--files "files_to_copy.yaml " \
--checksum "files_checksum.yaml"
And restart the partial copy:
shelephant_send files_to_copy.yaml remote_info.yaml
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file shelephant-0.24.2.tar.gz
.
File metadata
- Download URL: shelephant-0.24.2.tar.gz
- Upload date:
- Size: 58.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e94187ed1b5f28907af62f14585eb037e1b5cdf90f25a45e6fed2edeff3d129 |
|
MD5 | 7014cf4a5f50b5580cca4c3bce2fe26c |
|
BLAKE2b-256 | 7ed458fb0b70301d635572b4ae31911a8c2a7ae9af8e4644699ae9aed5090ab9 |
File details
Details for the file shelephant-0.24.2-py3-none-any.whl
.
File metadata
- Download URL: shelephant-0.24.2-py3-none-any.whl
- Upload date:
- Size: 41.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84c6d73d8a1bc231177fae3dbd6c2e95d32aa8b3912a17fb7e4385f476a774e4 |
|
MD5 | 54cdfbf4239613e0ee415485bd703655 |
|
BLAKE2b-256 | ec027b1c14538ffa7608041b3bdf47fad3c5bf9b3868bc5582d5063e8220e66e |