A Python package for managing AWS S3 bucket
Project description
Installation
pip install s3namic
Import module
from s3namic import s3namic
s3 = s3namic(
bucket="bucket_name",
access_key="access_key",
secret_key="secrey_key",
region="region",
)
Check S3 structure in tree form
s3_tree = s3.make_tree(
# with_file_name=True, # True if even the file name is included in the tree
)
import json
s3_tree = json.dumps(s3_tree, indent=4, ensure_ascii=False)
print(s3_tree)
output:
{
"assets/": {
"assets/backup/": {},
"assets/batch_raw/": {
"assets/batch_raw/batchData": {}
},
...
}
Check S3 structure in list form
s3_list = s3.list_files()
print(s3_list[:5], "\n...\n", s3_list[-5:])
output:
['first_file.json', 'second_file.json', ... ]
Find a specific file in s3
-
find_file
test_path = s3.find_file(file_name="2021-12-31", str_contains=True)
print(f"2023-04-30 File path containing filename: '{test_path}'")
output:
"2023-04-30 File path containing filename: 'assets/csv/2023-04-30.csv'"
-
find_files
prefix_path = test_path[:-len(test_path.split("/")[-1])]
test_files = s3.find_files(prefix=prefix_path)
print(f"Number of files under '{prefix_path}': {len(test_files)}")
output:
"Number of files under 'assets/csv/': 112"
Get from s3 to a specific path url
print(s3.get_file_url(file_name=test_path, expires_in=3600)) # Expires in 3600 seconds (1 hour)
output:
"https://bucket_name.s3.amazonaws.com/assets/csv/test.csv"
CRUD from S3
C, U(upload_file, _write_file)
- Upload files to s3
- The upload_file method reads the file and uploads it to memory, and the
_write_file
method writes the file directly without reading it, so memory usage is small. - Use
upload_file
to upload a file in your local to the s3 bucket, and use_write_file
to directly save a variable to s3 in the code. - The
write_csv
,write_json
,write_pkl
,write_txt
, andwrite_parquet
methods call the_write_file
method to save the file according to the extension.
R(_read_file)
- read file from s3
- The
read_csv
,read_json
,read_pkl
,read_txt
, andread_parquet
methods call the_read_file
method to read files according to the extension. - The
read_auto
method calls the above methods according to the extension to read the file. - The
read_thread
method speeds up the read_auto method by executing it multi-threaded.
D(delete_file)
- Delete files from s3
examples
- This example uses the csv extension, but json, pkl, txt, and parquet extensions can be used equally (refer to the above methods for usage).
import pandas as pd
# Save variable to file (write_csv)
test_write_csv = pd.DataFrame({
"test": [
"ํ๊ธ",
"English",
1234,
"!@#$%^&*()_+",
"๐๐๐๐ป๐๐ผ"
]
})
# directly save the variable (dataframe)
s3.write_csv(file_name="assets/test/test_write.csv", file_content=test_write_csv, encoding="utf-8", index=False)
# Compress and save in gzip or bzip2 format
s3.write_csv(file_name="assets/test/test_write.csv.gz", file_content=test_write_csv, compression="gzip", encoding="utf-8", index=False)
s3.write_csv(file_name="assets/test/test_write.csv.bz2", file_content=test_write_csv, compression="bz2", encoding="utf-8", index=False)
# Read the saved file (read_csv)
print(s3.read_csv.__doc__, end="\n====================\n")
pd.concat([
s3.read_csv(file_name="assets/test/test_write.csv", encoding="utf-8").rename(columns={"test": "Basic format"}),
# Read compressed files in gzip or bzip2 format
s3.read_csv(file_name="assets/test/test_write.csv.gz", encoding="utf-8").rename(columns={"test": "gzip format"}),
s3.read_csv(file_name="assets/test/test_write.csv.bz2", encoding="utf-8").rename(columns={"test": "bzip2 format"})
], axis=1)
output:
Basic format | gzip format | bzip2 format | |
---|---|---|---|
0 | ํ๊ธ | ํ๊ธ | ํ๊ธ |
1 | English | English | English |
2 | 1234 | 1234 | 1234 |
3 | !@#$%^&*()_+ | !@#$%^&*()_+ | !@#$%^&*()_+ |
4 | ๐๐๐๐ป๐๐ผ | ๐๐๐๐ป๐๐ผ | ๐๐๐๐ป๐๐ผ |
# Download the saved file locally (download_file)
print(s3.download_file.__doc__, end="\n====================\n")
load_path = os.getcwd()
s3.download_file(file_name="assets/test/test_write.csv", load_path=load_path+"/test_write.csv")
s3.download_file(file_name="assets/test/test_write.csv.gz", load_path=load_path+"/test_write.csv.gz")
s3.download_file(file_name="assets/test/test_write.csv.bz2", load_path=load_path+"/test_write.csv.bz2")
# Delete a file on s3 (delete_file)
print(s3.delete_file.__doc__, end="\n====================\n")
print(f"List of files before deletion: {s3.find_files(prefix='assets/test/')}")
s3.delete_file(file_name="assets/test/test_write.csv")
s3.delete_file(file_name="assets/test/test_write.csv.gz")
s3.delete_file(file_name="assets/test/test_write.csv.bz2")
print(f"List of files after deletion: {s3.find_files(prefix='assets/test/')}")
output:
"List of files before deletion: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt', 'assets/test/test_write.csv', 'assets/test/test_write.csv.bz2', 'assets/test/test_write.csv.gz']"
"List of files after deletion: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt']"
# Upload a file stored locally (upload_file)
print(s3.upload_file.__doc__, end="\n====================\n")
print(f"List of files before upload: {s3.find_files(prefix='assets/test/')}")
s3.upload_file(file_name="assets/test/test_write.csv", file_path=load_path+"/test_write.csv")
s3.upload_file(file_name="assets/test/test_write.csv.gz", file_path=load_path+"/test_write.csv.gz")
s3.upload_file(file_name="assets/test/test_write.csv.bz2", file_path=load_path+"/test_write.csv.bz2")
print(f"List of files after upload: {s3.find_files(prefix='assets/test/')}")
output:
"List of files before upload: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt']"
"List of files after upload: ['assets/test/', 'assets/test/test.csv', 'assets/test/test.json', 'assets/test/test.parquet', 'assets/test/test.pickle', 'assets/test/test.pkl', 'assets/test/test.txt', 'assets/test/test_write.csv', 'assets/test/test_write.csv.bz2', 'assets/test/test_write.csv.gz']"
# Delete local files
os.remove(load_path+"/test_write.csv")
os.remove(load_path+"/test_write.csv.gz")
os.remove(load_path+"/test_write.csv.bz2")
Methods that use CRUD in various ways
read_auto
- A method that executes one of
read_csv
,read_excel
,read_json
,read_parquet
, andread_pkl
depending on the file extension - You can automatically find the extension in the file name or specify the extension with the extension argument.
- A method that executes one of
read_thread
- Execute the
read_auto
method with multi-threading
- Execute the
compress
,decompress
- Compress and decompress files in s3 bucket and save as files
- Using
_read_file()
method,_write_file()
method
test | |
---|---|
0 | ํ๊ธ |
1 | English |
2 | 1234 |
3 | !@#$%^&*()_+ |
4 | ๐๐๐๐ป๐๐ผ |
auto_path = s3.find_file(file_name="2022-12", str_contains=True) # File path with filename containing 2022-12
print(f"File path with filename containing 2023-04-30: {auto_path}")
# Just put the folder path as prefix
folder_path = auto_path[:auto_path.rfind('/')] + '/'
print(f"Folder path of the file path: {folder_path}")
print(f"Number of files in the folder: {len(s3.find_files(prefix=folder_path))}")
auto_file = s3.read_thread(prefix=folder_path, encoding="cp949", workers=os.cpu_count(), extension="csv")
print(f"Number of data frames of files in the folder (list type): {len(auto_file)}")
output:
"File path with filename containing 2023-04-30: assets/csv/2023-04-30.csv"
"Folder path of the file path: assets/csv/"
"Number of files in the folder: 112"
"Number of data frames of files in the folder (list type): 112"
s3.compress(file_name="assets/test/test_write.csv", compression="gzip")
s3.compress(file_name="assets/test/test_write.csv", compression="bz2")
s3.decompress(file_name="assets/test/test_write.csv.gz")
s3.decompress(file_name="assets/test/test_write.csv.bz2")
output:
"The file assets/test/test_write.csv was compressed using gzip and saved as assets/test/test_write.csv.gz."
"The file assets/test/test_write.csv was compressed using bz2 and saved as assets/test/test_write.csv.bz2."
"The file assets/test/test_write.csv.gz was unzipped and saved as assets/test/test_write.csv."
"The file assets/test/test_write.csv.bz2 was unzipped and saved as assets/test/test_write.csv"
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
s3namic-0.0.5.tar.gz
(11.9 kB
view details)
Built Distribution
s3namic-0.0.5-py3-none-any.whl
(14.2 kB
view details)
File details
Details for the file s3namic-0.0.5.tar.gz
.
File metadata
- Download URL: s3namic-0.0.5.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4de3ae9318e254988deb5d91f2a7c3f2f06b147a6014f4a1c5e5e4a174f7c44 |
|
MD5 | b2800c58437cdd88f6858af0b48ada5b |
|
BLAKE2b-256 | 6ced83a84a12bdad89d03de6f38d2d25eac7bbcc0e9a2355fe098b36ce7986d8 |
File details
Details for the file s3namic-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: s3namic-0.0.5-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a81b42f09694cd60a485d6128fa633f11823b86f9eae0818ad2442811f06db33 |
|
MD5 | ef4c8c6b8ff47cfed13ee0caa52010e6 |
|
BLAKE2b-256 | 2ab11d540cca7b72a59599ef29713da92f5648f7264f2e52f44ccca6d70b674c |