Skip to main content

A Python package for managing AWS S3 bucket

Project description

Installation

pip install s3namic


Import module

from s3namic import s3namic
s3 = s3namic(
  bucket="bucket_name",
  access_key="access_key",
  secret_key="secrey_key",
  region="region",
)

Check S3 structure in tree form

s3_tree = s3.make_tree(
    # with_file_name=True, # True if even the file name is included in the tree
    )

import json
s3_tree = json.dumps(s3_tree, indent=4, ensure_ascii=False)
print(s3_tree)

output:

{
    "assets/": {
        "assets/backup/": {},
        "assets/batch_raw/": {
            "assets/batch_raw/batchData": {}
        },
        ...
}

Check S3 structure in list form

s3_list = s3.list_files()
print(s3_list[:5], "\n...\n", s3_list[-5:])

output:

['first_file.json', 'second_file.json', ... ]

Find a specific file in s3

  • find_file

test_path = s3.find_file(file_name="2021-12-31", str_contains=True)
print(f"2023-04-30 File path containing filename: '{test_path}'")

output:

"2023-04-30 File path containing filename: 'assets/csv/2023-04-30.csv'"
  • find_files

prefix_path = test_path[:-len(test_path.split("/")[-1])]
test_files = s3.find_files(prefix=prefix_path)
print(f"Number of files under '{prefix_path}': {len(test_files)}")

output:

"Number of files under 'assets/csv/': 112"

Get from s3 to a specific path url

print(s3.get_file_url(file_name=test_path, expires_in=3600)) # Expires in 3600 seconds (1 hour)

output:

"https://bucket_name.s3.amazonaws.com/assets/csv/test.csv"

CRUD from S3

C, U(upload_file, _write_file)

  • Upload files to s3
  • The upload_file method reads the file and uploads it to memory, and the _write_file method writes the file directly without reading it, so memory usage is small.
  • Use upload_file to upload a file in your local to the s3 bucket, and use _write_file to directly save a variable to s3 in the code.
  • The write_csv, write_json, write_pkl, write_txt, and write_parquet methods call the _write_file method to save the file according to the extension.

R(_read_file)

  • read file from s3
  • The read_csv, read_json, read_pkl, read_txt, and read_parquet methods call the _read_file method to read files according to the extension.
  • The read_auto method calls the above methods according to the extension to read the file.
  • The read_thread method speeds up the read_auto method by executing it multi-threaded.

D(delete_file)

  • Delete files from s3

examples

  • This example uses the csv extension, but json, pkl, txt, and parquet extensions can be used equally (refer to the above methods for usage).
import pandas as pd

# Save variable to file (write_csv)
test_write_csv = pd.DataFrame({
    "test": [
        "ํ•œ๊ธ€",
        "English",
        1234,
        "!@#$%^&*()_+",
        "๐Ÿ˜€๐Ÿ‘๐Ÿ‘๐Ÿป๐Ÿ‘๐Ÿผ"
    ]
})
# directly save the variable (dataframe)
s3.write_csv(file_name="assets/test/test_write.csv", file_content=test_write_csv, encoding="utf-8", index=False)
# Compress and save in gzip or bzip2 format
s3.write_csv(file_name="assets/test/test_write.csv.gz", file_content=test_write_csv, compression="gzip", encoding="utf-8", index=False)
s3.write_csv(file_name="assets/test/test_write.csv.bz2", file_content=test_write_csv, compression="bz2", encoding="utf-8", index=False)
# Read the saved file (read_csv)
print(s3.read_csv.__doc__, end="\n====================\n")
pd.concat([
    s3.read_csv(file_name="assets/test/test_write.csv", encoding="utf-8").rename(columns={"test": "Basic format"}),
    # Read compressed files in gzip or bzip2 format
    s3.read_csv(file_name="assets/test/test_write.csv.gz", encoding="utf-8").rename(columns={"test": "gzip format"}),
    s3.read_csv(file_name="assets/test/test_write.csv.bz2", encoding="utf-8").rename(columns={"test": "bzip2 format"})
], axis=1)

output:

Basic format gzip format bzip2 format
0 ํ•œ๊ธ€ ํ•œ๊ธ€ ํ•œ๊ธ€
1 English English English
2 1234 1234 1234
3 !@#$%^&*()_+ !@#$%^&*()_+ !@#$%^&*()_+
4 ๐Ÿ˜€๐Ÿ‘๐Ÿ‘๐Ÿป๐Ÿ‘๐Ÿผ ๐Ÿ˜€๐Ÿ‘๐Ÿ‘๐Ÿป๐Ÿ‘๐Ÿผ ๐Ÿ˜€๐Ÿ‘๐Ÿ‘๐Ÿป๐Ÿ‘๐Ÿผ

# Download the saved file locally (download_file)
print(s3.download_file.__doc__, end="\n====================\n")
load_path = os.getcwd()
s3.download_file(file_name="assets/test/test_write.csv", load_path=load_path+"/test_write.csv")
s3.download_file(file_name="assets/test/test_write.csv.gz", load_path=load_path+"/test_write.csv.gz")
s3.download_file(file_name="assets/test/test_write.csv.bz2", load_path=load_path+"/test_write.csv.bz2")
# Delete a file on s3 (delete_file)
print(s3.delete_file.__doc__, end="\n====================\n")
print(f"List of files before deletion: {s3.find_files(prefix='assets/test/')}")
s3.delete_file(file_name="assets/test/test_write.csv")
s3.delete_file(file_name="assets/test/test_write.csv.gz")
s3.delete_file(file_name="assets/test/test_write.csv.bz2")
print(f"List of files after deletion: {s3.find_files(prefix='assets/test/')}")

output:

"List of files before deletion: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt', 'assets/test/test_write.csv', 'assets/test/test_write.csv.bz2', 'assets/test/test_write.csv.gz']"
"List of files after deletion: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt']"

# Upload a file stored locally (upload_file)
print(s3.upload_file.__doc__, end="\n====================\n")
print(f"List of files before upload: {s3.find_files(prefix='assets/test/')}")
s3.upload_file(file_name="assets/test/test_write.csv", file_path=load_path+"/test_write.csv")
s3.upload_file(file_name="assets/test/test_write.csv.gz", file_path=load_path+"/test_write.csv.gz")
s3.upload_file(file_name="assets/test/test_write.csv.bz2", file_path=load_path+"/test_write.csv.bz2")
print(f"List of files after upload: {s3.find_files(prefix='assets/test/')}")

output:

"List of files before upload: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt']"
"List of files after upload: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt', 'assets/test/test_write.csv', 'assets/test/test_write.csv.bz2', 'assets/test/test_write.csv.gz']"
# Delete local files
os.remove(load_path+"/test_write.csv")
os.remove(load_path+"/test_write.csv.gz")
os.remove(load_path+"/test_write.csv.bz2")

Methods that use CRUD in various ways

  • read_auto
    • A method that executes one of read_csv, read_excel, read_json, read_parquet, and read_pkl depending on the file extension
    • You can automatically find the extension in the file name or specify the extension with the extension argument.

  • read_thread
    • Execute the read_auto method with multi-threading

  • compress, decompress
    • Compress and decompress files in s3 bucket and save as files
    • Using _read_file() method, _write_file() method

test
0 ํ•œ๊ธ€
1 English
2 1234
3 !@#$%^&*()_+
4 ๐Ÿ˜€๐Ÿ‘๐Ÿ‘๐Ÿป๐Ÿ‘๐Ÿผ
auto_path = s3.find_file(file_name="2022-12", str_contains=True)  # File path with filename containing 2022-12
print(f"File path with filename containing 2023-04-30: {auto_path}")
# Just put the folder path as prefix
folder_path = auto_path[:auto_path.rfind('/')] + '/'
print(f"Folder path of the file path: {folder_path}")
print(f"Number of files in the folder: {len(s3.find_files(prefix=folder_path))}")
auto_file = s3.read_thread(prefix=folder_path, encoding="cp949", workers=os.cpu_count(), extension="csv")
print(f"Number of data frames of files in the folder (list type): {len(auto_file)}")

output:

"File path with filename containing 2023-04-30: assets/csv/2023-04-30.csv"
"Folder path of the file path: assets/csv/"
"Number of files in the folder: 112"
"Number of data frames of files in the folder (list type): 112"

s3.compress(file_name="assets/test/test_write.csv", compression="gzip")
s3.compress(file_name="assets/test/test_write.csv", compression="bz2")
s3.decompress(file_name="assets/test/test_write.csv.gz")
s3.decompress(file_name="assets/test/test_write.csv.bz2")

output:

"The file assets/test/test_write.csv was compressed using gzip and saved as assets/test/test_write.csv.gz."
"The file assets/test/test_write.csv was compressed using bz2 and saved as assets/test/test_write.csv.bz2."
"The file assets/test/test_write.csv.gz was unzipped and saved as assets/test/test_write.csv."
"The file assets/test/test_write.csv.bz2 was unzipped and saved as assets/test/test_write.csv"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3namic-0.0.5.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

s3namic-0.0.5-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file s3namic-0.0.5.tar.gz.

File metadata

  • Download URL: s3namic-0.0.5.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.2

File hashes

Hashes for s3namic-0.0.5.tar.gz
Algorithm Hash digest
SHA256 b4de3ae9318e254988deb5d91f2a7c3f2f06b147a6014f4a1c5e5e4a174f7c44
MD5 b2800c58437cdd88f6858af0b48ada5b
BLAKE2b-256 6ced83a84a12bdad89d03de6f38d2d25eac7bbcc0e9a2355fe098b36ce7986d8

See more details on using hashes here.

File details

Details for the file s3namic-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: s3namic-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.2

File hashes

Hashes for s3namic-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 a81b42f09694cd60a485d6128fa633f11823b86f9eae0818ad2442811f06db33
MD5 ef4c8c6b8ff47cfed13ee0caa52010e6
BLAKE2b-256 2ab11d540cca7b72a59599ef29713da92f5648f7264f2e52f44ccca6d70b674c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page