Skip to main content

s3pathlib is the python package provides the Pythonic objective oriented programming (OOP) interface to manipulate AWS S3 object / directory. The api is similar to the pathlib standard library and very intuitive for human.

Project description

Documentation Status https://github.com/MacHu-GWU/s3pathlib-project/actions/workflows/main.yml/badge.svg https://codecov.io/gh/MacHu-GWU/s3pathlib-project/branch/main/graph/badge.svg https://img.shields.io/pypi/v/s3pathlib.svg https://img.shields.io/pypi/l/s3pathlib.svg https://img.shields.io/pypi/pyversions/s3pathlib.svg https://img.shields.io/pypi/dm/s3pathlib.svg https://img.shields.io/badge/✍️_Release_History!--None.svg?style=social&logo=github https://img.shields.io/badge/⭐_Star_me_on_GitHub!--None.svg?style=social&logo=github
https://img.shields.io/badge/Link-API-blue.svg https://img.shields.io/badge/Link-Source_Code-blue.svg https://img.shields.io/badge/Link-Submit_Issue-blue.svg https://img.shields.io/badge/Link-Request_Feature-blue.svg https://img.shields.io/badge/Link-Download-blue.svg

Welcome to s3pathlib Documentation

s3pathlib is a Python package that offers an object-oriented programming (OOP) interface to work with AWS S3 objects and directories. Its API is designed to be similar to the standard library pathlib and is user-friendly. The package also supports versioning in AWS S3.

Quick Start

Import the library, declare an S3Path object

# import
>>> from s3pathlib import S3Path

# construct from string, auto join parts
>>> p = S3Path("bucket", "folder", "file.txt")
# construct from S3 URI works too
>>> p = S3Path("s3://bucket/folder/file.txt")
# construct from S3 ARN works too
>>> p = S3Path("arn:aws:s3:::bucket/folder/file.txt")
>>> p.bucket
'bucket'
>>> p.key
'folder/file.txt'
>>> p.uri
's3://bucket/folder/file.txt'
>>> p.console_url # click to preview it in AWS console
'https://s3.console.aws.amazon.com/s3/object/bucket?prefix=folder/file.txt'
>>> p.arn
'arn:aws:s3:::bucket/folder/file.txt'

Talk to AWS S3 and get some information

# s3pathlib maintains a "context" object that holds the AWS authentication information
# you just need to build your own boto session object and attach to it
>>> import boto3
>>> from s3pathlib import context
>>> context.attach_boto_session(
...     boto3.session.Session(
...         region_name="us-east-1",
...         profile_name="my_aws_profile",
...     )
... )

>>> p = S3Path("bucket", "folder", "file.txt")
>>> p.write_text("a lot of data ...")
>>> p.etag
'3e20b77868d1a39a587e280b99cec4a8'
>>> p.size
56789000
>>> p.size_for_human
'51.16 MB'

# folder works too, you just need to use a tailing "/" to identify that
>>> p = S3Path("bucket", "datalake/")
>>> p.count_objects()
7164 # number of files under this prefix
>>> p.calculate_total_size()
(7164, 236483701963) # 7164 objects, 220.24 GB
>>> p.calculate_total_size(for_human=True)
(7164, '220.24 GB') # 7164 objects, 220.24 GB

Manipulate Folder in S3

Native S3 Write API (those operation that change the state of S3) only operate on object level. And the list_objects API returns 1000 objects at a time. You need additional effort to manipulate objects recursively. s3pathlib CAN SAVE YOUR LIFE

# create a S3 folder
>>> p = S3Path("bucket", "github", "repos", "my-repo/")

# upload all python file from /my-github-repo to s3://bucket/github/repos/my-repo/
>>> p.upload_dir("/my-repo", pattern="**/*.py", overwrite=False)

# copy entire s3 folder to another s3 folder
>>> p2 = S3Path("bucket", "github", "repos", "another-repo/")
>>> p1.copy_to(p2, overwrite=True)

# delete all objects in the folder, recursively, to clean up your test bucket
>>> p.delete()
>>> p2.delete()

S3 Path Filter

Ever think of filter S3 object by it’s attributes like: dirname, basename, file extension, etag, size, modified time? It is supposed to be simple in Python:

>>> s3bkt = S3Path("bucket") # assume you have a lots of files in this bucket
>>> iterproxy = s3bkt.iter_objects().filter(
...     S3Path.size >= 10_000_000, S3Path.ext == ".csv" # add filter
... )

>>> iterproxy.one() # fetch one
S3Path('s3://bucket/larger-than-10MB-1.csv')

>>> iterproxy.many(3) # fetch three
[
    S3Path('s3://bucket/larger-than-10MB-1.csv'),
    S3Path('s3://bucket/larger-than-10MB-2.csv'),
    S3Path('s3://bucket/larger-than-10MB-3.csv'),
]

>>> for p in iterproxy: # iter the rest
...     print(p)

File Like Object for Simple IO

S3Path is file-like object. It support open and context manager syntax out of the box. Here are only some highlight examples:

# Stream big file by line
>>> p = S3Path("bucket", "log.txt")
>>> with p.open("r") as f:
...     for line in f:
...         do what every you want

# JSON io
>>> import json
>>> p = S3Path("bucket", "config.json")
>>> with p.open("w") as f:
...     json.dump({"password": "mypass"}, f)

# pandas IO
>>> import pandas as pd
>>> p = S3Path("bucket", "dataset.csv")
>>> df = pd.DataFrame(...)
>>> with p.open("w") as f:
...     df.to_csv(f)

Now that you have a basic understanding of s3pathlib, let’s read the full document to explore its capabilities in greater depth.

Getting Help

Please use the python-s3pathlib tag on Stack Overflow to get help.

Submit a I want help issue tickets on GitHub Issues

Contributing

Please see the Contribution Guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s3pathlib-2.3.4.tar.gz (51.8 kB view details)

Uploaded Source

Built Distribution

s3pathlib-2.3.4-py3-none-any.whl (72.7 kB view details)

Uploaded Python 3

File details

Details for the file s3pathlib-2.3.4.tar.gz.

File metadata

  • Download URL: s3pathlib-2.3.4.tar.gz
  • Upload date:
  • Size: 51.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for s3pathlib-2.3.4.tar.gz
Algorithm Hash digest
SHA256 7a3e38e29946c776b99beea132ebbe1209653cfb17ea301d485a5738fc533df4
MD5 cacd791efd57d41c46b7a62efd70672e
BLAKE2b-256 21b680ef2d2c25bb341311ea1f8a839cd046a9b0bdcf997e0d96133ec05a0582

See more details on using hashes here.

File details

Details for the file s3pathlib-2.3.4-py3-none-any.whl.

File metadata

  • Download URL: s3pathlib-2.3.4-py3-none-any.whl
  • Upload date:
  • Size: 72.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.8

File hashes

Hashes for s3pathlib-2.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 11e6b04b94f08baca6da00cba7c428718626362d9d64fbecc2abad97e1d8f9a4
MD5 0d00176bb38179cc56561e42339f96b0
BLAKE2b-256 ebf1761248bfa050a4baef47108c892e57dbb33afb3bfdb31fa1eb091eb80d4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page