python binding for using s5cmd to download and upload files to s3 efficiently
Project description
s5cmd-python
python binding for using s5cmd to download and upload files to s3 efficiently
The S5CmdRunner
class provides a Python interface for interacting with s5cmd
, a command-line tool designed for efficient data transfer to and from Amazon S3.
For more information about s5cmd, please refer to the original s5cmd repository.
Features
- Check for the presence of
s5cmd
and download it if necessary. - Execute
s5cmd
commandscp
,mv
, andrun
. - Handle file downloads from URLs and S3 URIs.
- Generate command files for batch operations with
s5cmd
. - Simplify operations like copying and moving files between local paths and S3 URIs.
Installation
To use S5CmdRunner
, ensure that Python 3.10 or higher is installed. The project itself can be installed from pip:
pip install s5cmdpy
or from source:
git clone <repo url>
cd s5cmd-python
pip install -e .
Usage
Here are some examples of how to use the S5CmdRunner
class:
Initialize S5CmdRunner
from s5cmdpy import S5CmdRunner
runner = S5CmdRunner()
Run s5cmd with a Local Command File
# local_txt: `cp s3://dataset-artstation-uw2/artists/__andrey__/1841730##GZGgW.json .`
local_txt_path = "s5cmd_test.txt"
runner.run(local_txt_path)
Run s5cmd with a Command File from S3
# Useful in environments like SageMaker or for reproducibility;
# Extends `s5cmd run something.txt` to support command files stored in S3
txt_s3_uri = "s3://dataset-artstation-uw2/s5cmd_test.txt"
runner.run(txt_s3_uri)
Download Multiple Files from S3
# Input a series of S3 URIs to create the necessary commands.txt for `s5cmd run`,
# then execute `s5cmd run <commands.txt>`
s3_uris = [
's3://dataset-artstation-uw2/artists/__andrey__/1841730##GZGgW.json',
's3://dataset-artstation-uw2/artists/__andrey__/2249992##q5Y22.json'
]
destination_dir = '/home/ubuntu/datasets/s5cmd_test'
runner.download_from_s3_list(s3_uris, destination_dir)
Download a file from internet and upload to S3
cp
command also works with a file from internet:
# Download a file from internet and upload to S3
target_url = "https://huggingface.co/kiriyamaX/mld-caformer/resolve/main/ml_caformer_m36_dec-5-97527.onnx"
dst_s3_uri = "s3://dataset-artstation-uw2/_dev/"
runner.cp(target_url, dst_s3_uri)
License
S5cmd itself is MIT licensed. This project is also MIT licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file s5cmdpy-0.1.4.tar.gz
.
File metadata
- Download URL: s5cmdpy-0.1.4.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36f28ed58bf02bd970e8e72009c9cacb411603f6ca0da3db99af8d932931e1d9 |
|
MD5 | 65ca0651a0d6f48c059dfada72d04c15 |
|
BLAKE2b-256 | 378a986db6ad178a71a676fe854581253bcd1724efd55a9e13019ffaa32b0b5c |
File details
Details for the file s5cmdpy-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: s5cmdpy-0.1.4-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 433139b316b9e14e56676bbdd0f2f45a0f0fed9a2dcc6fdfdd9c432ea6d57556 |
|
MD5 | c02c1c9db4cc1ab1eb00a6bc0fe2a087 |
|
BLAKE2b-256 | 1c7a812541370afd8041f2959e97df39531c75fdf1a5f5f107bd287868f59e0a |