Concat files in s3

These details have not been verified by PyPI

Project links

Homepage

Project description

Python S3 Concat

S3 Concat is used to concatenate many small files in an s3 bucket into fewer larger files.

Install

pip install s3-concat

Usage

Command Line

$ s3-concat -h

Import

from s3_concat import S3Concat

bucket = "YOUR_BUCKET_NAME"
path_to_concat = "PATH_TO_FILES_TO_CONCAT"
concatenated_file = "FILE_TO_SAVE_TO.json"
# Setting this to a size will always add a part number at the end of the file name
min_file_size = "50MB"  # ex: FILE_TO_SAVE_TO-1.json, FILE_TO_SAVE_TO-2.json, ...
# Setting this to None will concat all files into a single file
# min_file_size = None  ex: FILE_TO_SAVE_TO.json

# Init the job
job = S3Concat(bucket, concatenated_file, min_file_size,
               content_type="application/json",
              #  source_bucket="SOURCE_BUCKET_NAME",  # For copying files from another bucket
              #  session=boto3.session.Session(),  # For custom aws session
              #  s3_client_kwargs={}  # Use to pass arguments allowed by the s3 client: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
              # delimiter="\n",  # Will insert this delimiter between each file when concatenating. Warning, this will need to download all files no matter the size to add this delimiter
               )
# Add files, can call multiple times to add files from other directories
job.add_files(path_to_concat)
# Add a single file at a time
job.add_file("some/file_key.json")
# Only use small_parts_threads if you need to. See Advanced Usage section below.
job.concat(small_parts_threads=4, main_threads=2)

Advanced Usage

Depending on your use case, you may want to use more threads then just 1.

main_threads is the number of threads to use when uploading files to s3. This will help when there are a lot of files that are already over the min_file_size that is set
small_parts_threads is only used when the files you are trying to concat are less then 5MB. These are spawned from inside of the main_threads. Due to the limitations of the s3 multipart_upload api (see Limitations below) any files less then 5MB need to be downloaded locally, concated together, then re uploaded. By setting this thread count it will download the parts in parallel for faster creation of the concatenation process.

The values set for these arguments depends on your use case and the system you are running this on.

Limitations

This uses the multipart upload of s3 and its limits are https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.0

May 15, 2025

0.2.4

Apr 27, 2023

0.2.3

May 21, 2021

0.2.2

Feb 3, 2021

0.2.1

Oct 3, 2020

0.2.0

Aug 27, 2020

0.1.9

May 14, 2020

0.1.8

May 10, 2020

0.1.7

Jan 20, 2020

0.1.6

Jan 16, 2020

0.1.5

Jan 16, 2020

0.1.4

Oct 24, 2019

0.1.3

Jul 3, 2019

0.1.2

May 23, 2019

0.1.1

May 16, 2019

0.1.0

May 9, 2019

0.0.7

Jan 20, 2020

0.0.6rc5 pre-release

May 8, 2019

0.0.6rc4 pre-release

May 8, 2019

0.0.6rc3 pre-release

May 8, 2019

0.0.6rc2 pre-release

May 7, 2019

0.0.6rc1 pre-release

May 1, 2019

0.0.5

May 1, 2019

0.0.4

Mar 2, 2019

0.0.3

Mar 1, 2019

0.0.2

Mar 1, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3_concat-0.3.0.tar.gz (10.6 kB view details)

Uploaded May 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

s3_concat-0.3.0-py3-none-any.whl (9.7 kB view details)

Uploaded May 15, 2025 Python 3

File details

Details for the file s3_concat-0.3.0.tar.gz.

File metadata

Download URL: s3_concat-0.3.0.tar.gz
Upload date: May 15, 2025
Size: 10.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.3

File hashes

Hashes for s3_concat-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`287af84d4020d8ac5241abfa6a20ae3e7e94c3721bb7659e3f6ea45562117ac1`
MD5	`dae6c31d0062a17d434b9b7720fd43b1`
BLAKE2b-256	`d6d27e361e7046a16cb9f6f16bcc311107d3fcb73220637b4d78ba8fb30c756e`

See more details on using hashes here.

File details

Details for the file s3_concat-0.3.0-py3-none-any.whl.

File metadata

Download URL: s3_concat-0.3.0-py3-none-any.whl
Upload date: May 15, 2025
Size: 9.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.7.3

File hashes

Hashes for s3_concat-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`544f7bc7c1016a3aa98179ab39068640d4c003eca11060fc2629fdaf8c2ed054`
MD5	`b55c89f448e7a7d5dc895bca41a30b92`
BLAKE2b-256	`63ccc2fabe743b13a70e8a75d3088564899b4eb807577bf9148989cc00a51ee0`

See more details on using hashes here.

s3-concat 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Python S3 Concat

Install

Usage

Command Line

Import

Advanced Usage

Limitations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes