s3namic

A Python package for managing AWS S3 bucket

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Installation

pip install s3namic

Import module

from s3namic import s3namic
s3 = s3namic(
  bucket="bucket_name",
  access_key="access_key",
  secret_key="secrey_key",
  region="region",
)

Check S3 structure in tree form

s3_tree = s3.make_tree(
    # with_file_name=True, # True if even the file name is included in the tree
    )

import json
s3_tree = json.dumps(s3_tree, indent=4, ensure_ascii=False)
print(s3_tree)

output:

{
    "assets/": {
        "assets/backup/": {},
        "assets/batch_raw/": {
            "assets/batch_raw/batchData": {}
        },
        ...
}

Check S3 structure in list form

s3_list = s3.list_files()
print(s3_list[:5], "\n...\n", s3_list[-5:])

output:

['first_file.json', 'second_file.json', ... ]

Find a specific file in s3

find_file

test_path = s3.find_file(file_name="2021-12-31", str_contains=True)
print(f"2023-04-30 File path containing filename: '{test_path}'")

output:

"2023-04-30 File path containing filename: 'assets/csv/2023-04-30.csv'"

find_files

prefix_path = test_path[:-len(test_path.split("/")[-1])]
test_files = s3.find_files(prefix=prefix_path)
print(f"Number of files under '{prefix_path}': {len(test_files)}")

output:

"Number of files under 'assets/csv/': 112"

Get from s3 to a specific path url

print(s3.get_file_url(file_name=test_path, expires_in=3600)) # Expires in 3600 seconds (1 hour)

output:

"https://bucket_name.s3.amazonaws.com/assets/csv/test.csv"

CRUD from S3

C, U(upload_file, _write_file)

Upload files to s3
The upload_file method reads the file and uploads it to memory, and the _write_file method writes the file directly without reading it, so memory usage is small.
Use upload_file to upload a file in your local to the s3 bucket, and use _write_file to directly save a variable to s3 in the code.
The write_csv, write_json, write_pkl, write_txt, and write_parquet methods call the _write_file method to save the file according to the extension.

R(_read_file)

read file from s3
The read_csv, read_json, read_pkl, read_txt, and read_parquet methods call the _read_file method to read files according to the extension.
The read_auto method calls the above methods according to the extension to read the file.
The read_thread method speeds up the read_auto method by executing it multi-threaded.

D(delete_file)

Delete files from s3

examples

This example uses the csv extension, but json, pkl, txt, and parquet extensions can be used equally (refer to the above methods for usage).

import pandas as pd

# Save variable to file (write_csv)
test_write_csv = pd.DataFrame({
    "test": [
        "한글",
        "English",
        1234,
        "!@#$%^&*()_+",
        "😀👍👍🏻👍🏼"
    ]
})
# directly save the variable (dataframe)
s3.write_csv(file_name="assets/test/test_write.csv", file_content=test_write_csv, encoding="utf-8", index=False)
# Compress and save in gzip or bzip2 format
s3.write_csv(file_name="assets/test/test_write.csv.gz", file_content=test_write_csv, compression="gzip", encoding="utf-8", index=False)
s3.write_csv(file_name="assets/test/test_write.csv.bz2", file_content=test_write_csv, compression="bz2", encoding="utf-8", index=False)

# Read the saved file (read_csv)
print(s3.read_csv.__doc__, end="\n====================\n")
pd.concat([
    s3.read_csv(file_name="assets/test/test_write.csv", encoding="utf-8").rename(columns={"test": "Basic format"}),
    # Read compressed files in gzip or bzip2 format
    s3.read_csv(file_name="assets/test/test_write.csv.gz", encoding="utf-8").rename(columns={"test": "gzip format"}),
    s3.read_csv(file_name="assets/test/test_write.csv.bz2", encoding="utf-8").rename(columns={"test": "bzip2 format"})
], axis=1)

output:

	Basic format	gzip format	bzip2 format
0	한글	한글	한글
1	English	English	English
2	1234	1234	1234
3	!@#$%^&*()_+	!@#$%^&*()_+	!@#$%^&*()_+
4	😀👍👍🏻👍🏼	😀👍👍🏻👍🏼	😀👍👍🏻👍🏼

# Download the saved file locally (download_file)
print(s3.download_file.__doc__, end="\n====================\n")
load_path = os.getcwd()
s3.download_file(file_name="assets/test/test_write.csv", load_path=load_path+"/test_write.csv")
s3.download_file(file_name="assets/test/test_write.csv.gz", load_path=load_path+"/test_write.csv.gz")
s3.download_file(file_name="assets/test/test_write.csv.bz2", load_path=load_path+"/test_write.csv.bz2")

# Delete a file on s3 (delete_file)
print(s3.delete_file.__doc__, end="\n====================\n")
print(f"List of files before deletion: {s3.find_files(prefix='assets/test/')}")
s3.delete_file(file_name="assets/test/test_write.csv")
s3.delete_file(file_name="assets/test/test_write.csv.gz")
s3.delete_file(file_name="assets/test/test_write.csv.bz2")
print(f"List of files after deletion: {s3.find_files(prefix='assets/test/')}")

output:

"List of files before deletion: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt', 'assets/test/test_write.csv', 'assets/test/test_write.csv.bz2', 'assets/test/test_write.csv.gz']"
"List of files after deletion: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt']"

# Upload a file stored locally (upload_file)
print(s3.upload_file.__doc__, end="\n====================\n")
print(f"List of files before upload: {s3.find_files(prefix='assets/test/')}")
s3.upload_file(file_name="assets/test/test_write.csv", file_path=load_path+"/test_write.csv")
s3.upload_file(file_name="assets/test/test_write.csv.gz", file_path=load_path+"/test_write.csv.gz")
s3.upload_file(file_name="assets/test/test_write.csv.bz2", file_path=load_path+"/test_write.csv.bz2")
print(f"List of files after upload: {s3.find_files(prefix='assets/test/')}")

output:

"List of files before upload: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt']"
"List of files after upload: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt', 'assets/test/test_write.csv', 'assets/test/test_write.csv.bz2', 'assets/test/test_write.csv.gz']"

# Delete local files
os.remove(load_path+"/test_write.csv")
os.remove(load_path+"/test_write.csv.gz")
os.remove(load_path+"/test_write.csv.bz2")

Methods that use CRUD in various ways

read_auto
- A method that executes one of read_csv, read_excel, read_json, read_parquet, and read_pkl depending on the file extension
- You can automatically find the extension in the file name or specify the extension with the extension argument.
read_thread
- Execute the read_auto method with multi-threading
compress, decompress
- Compress and decompress files in s3 bucket and save as files
- Using _read_file() method, _write_file() method

	test
0	한글
1	English
2	1234
3	!@#$%^&*()_+
4	😀👍👍🏻👍🏼

auto_path = s3.find_file(file_name="2022-12", str_contains=True)  # File path with filename containing 2022-12
print(f"File path with filename containing 2023-04-30: {auto_path}")
# Just put the folder path as prefix
folder_path = auto_path[:auto_path.rfind('/')] + '/'
print(f"Folder path of the file path: {folder_path}")
print(f"Number of files in the folder: {len(s3.find_files(prefix=folder_path))}")
auto_file = s3.read_thread(prefix=folder_path, encoding="cp949", workers=os.cpu_count(), extension="csv")
print(f"Number of data frames of files in the folder (list type): {len(auto_file)}")

output:

"File path with filename containing 2023-04-30: assets/csv/2023-04-30.csv"
"Folder path of the file path: assets/csv/"
"Number of files in the folder: 112"
"Number of data frames of files in the folder (list type): 112"

s3.compress(file_name="assets/test/test_write.csv", compression="gzip")
s3.compress(file_name="assets/test/test_write.csv", compression="bz2")
s3.decompress(file_name="assets/test/test_write.csv.gz")
s3.decompress(file_name="assets/test/test_write.csv.bz2")

output:

"The file assets/test/test_write.csv was compressed using gzip and saved as assets/test/test_write.csv.gz."
"The file assets/test/test_write.csv was compressed using bz2 and saved as assets/test/test_write.csv.bz2."
"The file assets/test/test_write.csv.gz was unzipped and saved as assets/test/test_write.csv."
"The file assets/test/test_write.csv.bz2 was unzipped and saved as assets/test/test_write.csv"

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.5

May 22, 2023

0.0.4

May 22, 2023

0.0.3

May 22, 2023

0.0.2

May 22, 2023

0.0.1

May 22, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3namic-0.0.5.tar.gz (11.9 kB view hashes)

Uploaded May 22, 2023 Source

Built Distribution

s3namic-0.0.5-py3-none-any.whl (14.2 kB view hashes)

Uploaded May 22, 2023 Python 3

Hashes for s3namic-0.0.5.tar.gz

Hashes for s3namic-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`b4de3ae9318e254988deb5d91f2a7c3f2f06b147a6014f4a1c5e5e4a174f7c44`
MD5	`b2800c58437cdd88f6858af0b48ada5b`
BLAKE2b-256	`6ced83a84a12bdad89d03de6f38d2d25eac7bbcc0e9a2355fe098b36ce7986d8`

Hashes for s3namic-0.0.5-py3-none-any.whl

Hashes for s3namic-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a81b42f09694cd60a485d6128fa633f11823b86f9eae0818ad2442811f06db33`
MD5	`ef4c8c6b8ff47cfed13ee0caa52010e6`
BLAKE2b-256	`2ab11d540cca7b72a59599ef29713da92f5648f7264f2e52f44ccca6d70b674c`