Skip to main content

python binding for using s5cmd to download and upload files to s3 efficiently

Project description

s5cmd-python

updated doc: check lark: https://mufo9rl7c6.larksuite.com/wiki/DvgPwF3gKifmOWk2hNnuTkXXsVh?from=from_copylink

python binding for using s5cmd to download and upload files to s3 efficiently

The S5CmdRunner class provides a Python interface for interacting with s5cmd, a command-line tool designed for efficient data transfer to and from Amazon S3.

For more information about s5cmd, please refer to the original s5cmd repository.

Features

  • Check for the presence of s5cmd and download it if necessary.
  • Execute s5cmd commands cp, mv, and run.
  • Handle file downloads from URLs and S3 URIs.
  • Generate command files for batch operations with s5cmd.
  • Simplify operations like copying and moving files between local paths and S3 URIs.

Installation

To use S5CmdRunner, ensure that Python 3.10 or higher is installed. The project itself can be installed from pip:

pip install s5cmdpy

or from source:

git clone <repo url>
cd s5cmd-python
pip install -e .

Usage

Here are some examples of how to use the S5CmdRunner class:

Initialize S5CmdRunner

from s5cmdpy import S5CmdRunner
runner = S5CmdRunner()

Run s5cmd with a Local Command File

# local_txt: `cp s3://dataset-artstation-uw2/artists/__andrey__/1841730##GZGgW.json .`
local_txt_path = "s5cmd_test.txt"
runner.run(local_txt_path)

Run s5cmd with a Command File from S3

# Useful in environments like SageMaker or for reproducibility; 
# Extends `s5cmd run something.txt` to support command files stored in S3
txt_s3_uri = "s3://dataset-artstation-uw2/s5cmd_test.txt"
runner.run(txt_s3_uri)

Download Multiple Files from S3

# Input a series of S3 URIs to create the necessary commands.txt for `s5cmd run`, 
# then execute `s5cmd run <commands.txt>`

s3_uris = [
    's3://dataset-artstation-uw2/artists/__andrey__/1841730##GZGgW.json', 
    's3://dataset-artstation-uw2/artists/__andrey__/2249992##q5Y22.json'
]
destination_dir = '/home/ubuntu/datasets/s5cmd_test'
runner.download_from_s3_list(s3_uris, destination_dir)

Download a file from internet and upload to S3

cp command also works with a file from internet:

# Download a file from internet and upload to S3
target_url = "https://huggingface.co/kiriyamaX/mld-caformer/resolve/main/ml_caformer_m36_dec-5-97527.onnx"
dst_s3_uri = "s3://dataset-artstation-uw2/_dev/"

runner.cp(target_url, dst_s3_uri)

License

S5cmd itself is MIT licensed. This project is also MIT licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s5cmdpy-0.2.6.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

s5cmdpy-0.2.6-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file s5cmdpy-0.2.6.tar.gz.

File metadata

  • Download URL: s5cmdpy-0.2.6.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for s5cmdpy-0.2.6.tar.gz
Algorithm Hash digest
SHA256 de418d0f451bdea8840a9022669a16a5c25811ca2cd71c3f684712fcce27eee8
MD5 3485c596c20bee58db7a18b2c5644abe
BLAKE2b-256 06db45ef489c4a8bc21eb1d05059679053728c68a0f1499c25af281c96e3e3da

See more details on using hashes here.

File details

Details for the file s5cmdpy-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: s5cmdpy-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for s5cmdpy-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 ed0f8f6819d0d7278f7cca09fa2d63f020de0bcf8025746c9f58e275f15830ea
MD5 382721dade08e5d470def3a1bb188592
BLAKE2b-256 cabc0848d6bce3999517e288325b9e5c1ff40f8358278d9c44bd09b2de0005dc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page