Skip to main content

A Python package for managing AWS S3 bucket

Project description

Installation

pip install s3namic


Import module

from s3namic import s3namic
s3 = s3namic(
  bucket="bucket_name",
  access_key="access_key",
  secret_key="secrey_key",
  region="region",
)

Check S3 structure in tree form

s3_tree = s3.make_tree(
    # with_file_name=True, # True if even the file name is included in the tree
    )

import json
s3_tree = json.dumps(s3_tree, indent=4, ensure_ascii=False)
print(s3_tree)

output:

{
    "assets/": {
        "assets/backup/": {},
        "assets/batch_raw/": {
            "assets/batch_raw/batchData": {}
        },
        ...
}

Check S3 structure in list form

s3_list = s3.list_files()
print(s3_list[:5], "\n...\n", s3_list[-5:])

output:

['first_file.json', 'second_file.json', ... ]

Find a specific file in s3

  • find_file

test_path = s3.find_file(file_name="2021-12-31", str_contains=True)
print(f"2023-04-30 File path containing filename: '{test_path}'")

output:

"2023-04-30 File path containing filename: 'assets/csv/2023-04-30.csv'"
  • find_files

prefix_path = test_path[:-len(test_path.split("/")[-1])]
test_files = s3.find_files(prefix=prefix_path)
print(f"Number of files under '{prefix_path}': {len(test_files)}")

output:

"Number of files under 'assets/csv/': 112"

Get from s3 to a specific path url

print(s3.get_file_url(file_name=test_path, expires_in=3600)) # Expires in 3600 seconds (1 hour)

output:

"https://bucket_name.s3.amazonaws.com/assets/csv/test.csv"

CRUD from S3

C, U(upload_file, _write_file)

  • Upload files to s3
  • The upload_file method reads the file and uploads it to memory, and the _write_file method writes the file directly without reading it, so memory usage is small.
  • Use upload_file to upload a file in your local to the s3 bucket, and use _write_file to directly save a variable to s3 in the code.
  • The write_csv, write_json, write_pkl, write_txt, and write_parquet methods call the _write_file method to save the file according to the extension.

R(_read_file)

  • read file from s3
  • The read_csv, read_json, read_pkl, read_txt, and read_parquet methods call the _read_file method to read files according to the extension.
  • The read_auto method calls the above methods according to the extension to read the file.
  • The read_thread method speeds up the read_auto method by executing it multi-threaded.

D(delete_file)

  • Delete files from s3

examples

  • This example uses the csv extension, but json, pkl, txt, and parquet extensions can be used equally (refer to the above methods for usage).
import pandas as pd

# Save variable to file (write_csv)
test_write_csv = pd.DataFrame({
    "test": [
        "ํ•œ๊ธ€",
        "English",
        1234,
        "!@#$%^&*()_+",
        "๐Ÿ˜€๐Ÿ‘๐Ÿ‘๐Ÿป๐Ÿ‘๐Ÿผ"
    ]
})
# directly save the variable (dataframe)
s3.write_csv(file_name="assets/test/test_write.csv", file_content=test_write_csv, encoding="utf-8", index=False)
# Compress and save in gzip or bzip2 format
s3.write_csv(file_name="assets/test/test_write.csv.gz", file_content=test_write_csv, compression="gzip", encoding="utf-8", index=False)
s3.write_csv(file_name="assets/test/test_write.csv.bz2", file_content=test_write_csv, compression="bz2", encoding="utf-8", index=False)
# Read the saved file (read_csv)
print(s3.read_csv.__doc__, end="\n====================\n")
pd.concat([
    s3.read_csv(file_name="assets/test/test_write.csv", encoding="utf-8").rename(columns={"test": "Basic format"}),
    # Read compressed files in gzip or bzip2 format
    s3.read_csv(file_name="assets/test/test_write.csv.gz", encoding="utf-8").rename(columns={"test": "gzip format"}),
    s3.read_csv(file_name="assets/test/test_write.csv.bz2", encoding="utf-8").rename(columns={"test": "bzip2 format"})
], axis=1)

output:

Basic format gzip format bzip2 format
0 ํ•œ๊ธ€ ํ•œ๊ธ€ ํ•œ๊ธ€
1 English English English
2 1234 1234 1234
3 !@#$%^&*()_+ !@#$%^&*()_+ !@#$%^&*()_+
4 ๐Ÿ˜€๐Ÿ‘๐Ÿ‘๐Ÿป๐Ÿ‘๐Ÿผ ๐Ÿ˜€๐Ÿ‘๐Ÿ‘๐Ÿป๐Ÿ‘๐Ÿผ ๐Ÿ˜€๐Ÿ‘๐Ÿ‘๐Ÿป๐Ÿ‘๐Ÿผ

# Download the saved file locally (download_file)
print(s3.download_file.__doc__, end="\n====================\n")
load_path = os.getcwd()
s3.download_file(file_name="assets/test/test_write.csv", load_path=load_path+"/test_write.csv")
s3.download_file(file_name="assets/test/test_write.csv.gz", load_path=load_path+"/test_write.csv.gz")
s3.download_file(file_name="assets/test/test_write.csv.bz2", load_path=load_path+"/test_write.csv.bz2")
# Delete a file on s3 (delete_file)
print(s3.delete_file.__doc__, end="\n====================\n")
print(f"List of files before deletion: {s3.find_files(prefix='assets/test/')}")
s3.delete_file(file_name="assets/test/test_write.csv")
s3.delete_file(file_name="assets/test/test_write.csv.gz")
s3.delete_file(file_name="assets/test/test_write.csv.bz2")
print(f"List of files after deletion: {s3.find_files(prefix='assets/test/')}")

output:

"List of files before deletion: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt', 'assets/test/test_write.csv', 'assets/test/test_write.csv.bz2', 'assets/test/test_write.csv.gz']"
"List of files after deletion: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt']"

# Upload a file stored locally (upload_file)
print(s3.upload_file.__doc__, end="\n====================\n")
print(f"List of files before upload: {s3.find_files(prefix='assets/test/')}")
s3.upload_file(file_name="assets/test/test_write.csv", file_path=load_path+"/test_write.csv")
s3.upload_file(file_name="assets/test/test_write.csv.gz", file_path=load_path+"/test_write.csv.gz")
s3.upload_file(file_name="assets/test/test_write.csv.bz2", file_path=load_path+"/test_write.csv.bz2")
print(f"List of files after upload: {s3.find_files(prefix='assets/test/')}")

output:

"List of files before upload: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt']"
"List of files after upload: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt', 'assets/test/test_write.csv', 'assets/test/test_write.csv.bz2', 'assets/test/test_write.csv.gz']"
# Delete local files
os.remove(load_path+"/test_write.csv")
os.remove(load_path+"/test_write.csv.gz")
os.remove(load_path+"/test_write.csv.bz2")

Methods that use CRUD in various ways

  • read_auto
    • A method that executes one of read_csv, read_excel, read_json, read_parquet, and read_pkl depending on the file extension
    • You can automatically find the extension in the file name or specify the extension with the extension argument.

  • read_thread
    • Execute the read_auto method with multi-threading

  • compress, decompress
    • Compress and decompress files in s3 bucket and save as files
    • Using _read_file() method, _write_file() method

test
0 ํ•œ๊ธ€
1 English
2 1234
3 !@#$%^&*()_+
4 ๐Ÿ˜€๐Ÿ‘๐Ÿ‘๐Ÿป๐Ÿ‘๐Ÿผ
auto_path = s3.find_file(file_name="2022-12", str_contains=True)  # File path with filename containing 2022-12
print(f"File path with filename containing 2023-04-30: {auto_path}")
# Just put the folder path as prefix
folder_path = auto_path[:auto_path.rfind('/')] + '/'
print(f"Folder path of the file path: {folder_path}")
print(f"Number of files in the folder: {len(s3.find_files(prefix=folder_path))}")
auto_file = s3.read_thread(prefix=folder_path, encoding="cp949", workers=os.cpu_count(), extension="csv")
print(f"Number of data frames of files in the folder (list type): {len(auto_file)}")

output:

"File path with filename containing 2023-04-30: assets/csv/2023-04-30.csv"
"Folder path of the file path: assets/csv/"
"Number of files in the folder: 112"
"Number of data frames of files in the folder (list type): 112"

s3.compress(file_name="assets/test/test_write.csv", compression="gzip")
s3.compress(file_name="assets/test/test_write.csv", compression="bz2")
s3.decompress(file_name="assets/test/test_write.csv.gz")
s3.decompress(file_name="assets/test/test_write.csv.bz2")

output:

"The file assets/test/test_write.csv was compressed using gzip and saved as assets/test/test_write.csv.gz."
"The file assets/test/test_write.csv was compressed using bz2 and saved as assets/test/test_write.csv.bz2."
"The file assets/test/test_write.csv.gz was unzipped and saved as assets/test/test_write.csv."
"The file assets/test/test_write.csv.bz2 was unzipped and saved as assets/test/test_write.csv"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3namic-0.0.5.tar.gz (11.9 kB view hashes)

Uploaded Source

Built Distribution

s3namic-0.0.5-py3-none-any.whl (14.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page