Skip to main content

No project description provided

Project description

Docs Status PyPI conda-forge tests codecov

Our goal is to be the meringue of file management libraries: the subtle sweetness of pathlib working in harmony with the ethereal lightness of the cloud.

A library that implements (nearly all) of the pathlib.Path methods for URIs for different cloud storage services.

with CloudPath("s3://bucket/filename.txt").open("w+") as f:
    f.write("Send my changes to the cloud!")

Why use cloudpathlib?

  • Familiar: If you know how to interact with Path, you know how to interact with CloudPath. All of the cloud-relevant Path methods are implemented.
  • Supported clouds: AWS S3 and Azure Blob Storage are implemented. Google Cloud Storage and FTP are on the way.
  • Extensible: The base classes do most of the work generically, so implementing two small classes MyPath and MyClient is all you need to add support for a new cloud storage service.
  • Read/write support: Reading just works. Using the write_text, write_bytes or .open('w') methods will all upload your changes to cloud storage without any additional file management as a developer.
  • Seamless caching: Files are downloaded locally only when necessary. You can also easily pass a persistent cache folder so that across processes and sessions you only re-download what is necessary.
  • Tested: Comprehensive test suite and code coverage.

Installation

cloudpathlib depends on the cloud services' SDKs (e.g., boto3, azure-storage-blob) to communicate with their respective storage service. If you try to use cloud paths for a cloud service for which you don't have dependencies installed, cloudpathlib will error and let you know what you need to install.

To install a cloud service's SDK dependency when installing cloudpathlib, you need to specify it using pip's "extras" specification. For example:

pip install cloudpathlib[s3,azure]

Currently supported cloud storage services are: azure, s3. You can also use all to install all available services' dependencies.

If you do not specify any extras or separately install any cloud SDKs, you will only be able to develop with the base classes for rolling your own cloud path class.

conda

cloudpathlib is also available using conda from conda-forge. Note that to install the necessary cloud service SDK dependency, you should include the appropriate suffix in the package name. For example:

conda install cloudpathlib-s3 -c conda-forge

If no suffix is used, only the base classes will be usable. See the conda-forge/cloudpathlib-feedstock for all installation options.

Development version

You can get latest development version from GitHub:

pip install https://github.com/drivendataorg/cloudpathlib.git#egg=cloudpathlib[all]

Note that you similarly need to specify cloud service dependencies, such as all in the above example command.

Quick usage

Here's an example to get the gist of using the package. By default, cloudpathlib authenticates with the environment variables supported by each respective cloud service SDK. For more details and advanced authentication options, see the "Authentication" documentation.

from cloudpathlib import CloudPath

# dispatches to S3Path based on prefix
root_dir = CloudPath("s3://drivendata-public-assets/")
root_dir
#> S3Path('s3://drivendata-public-assets/')

# there's only one file, but globbing works in nested folder
for f in root_dir.glob('**/*.txt'):
    text_data = f.read_text()
    print(f)
    print(text_data)
#> s3://drivendata-public-assets/odsc-west-2019/DATA_DICTIONARY.txt
#> Eviction Lab Data Dictionary
#>
#> Additional information in our FAQ evictionlab.org/help-faq/
#> Full methodology evictionlab.org/methods/
#>
#> ... (additional text output truncated)

# use / to join paths (and, in this case, create a new file)
new_file_copy = root_dir / "nested_dir/copy_file.txt"
new_file_copy
#> S3Path('s3://drivendata-public-assets/nested_dir/copy_file.txt')

# show things work and the file does not exist yet
new_file_copy.exists()
#> False

# writing text data to the new file in the cloud
new_file_copy.write_text(text_data)
#> 6933

# file now listed
list(root_dir.glob('**/*.txt'))
#> [S3Path('s3://drivendata-public-assets/nested_dir/copy_file.txt'),
#>  S3Path('s3://drivendata-public-assets/odsc-west-2019/DATA_DICTIONARY.txt')]

# but, we can remove it
new_file_copy.unlink()

# no longer there
list(root_dir.glob('**/*.txt'))
#> [S3Path('s3://drivendata-public-assets/odsc-west-2019/DATA_DICTIONARY.txt')]

Supported methods and properties

Most methods and properties from pathlib.Path are supported except for the ones that don't make sense in a cloud context. There are a few additional methods or properties that relate to specific cloud services or specifically for cloud paths.

Methods + properties AzureBlobPath S3Path
anchor
as_uri
drive
exists
glob
is_dir
is_file
iterdir
joinpath
match
mkdir
name
open
parent
parents
parts
read_bytes
read_text
rename
replace
rglob
rmdir
samefile
stat
stem
suffix
suffixes
touch
unlink
with_name
with_suffix
write_bytes
write_text
absolute
as_posix
chmod
cwd
expanduser
group
home
is_absolute
is_block_device
is_char_device
is_fifo
is_mount
is_reserved
is_socket
is_symlink
lchmod
link_to
lstat
owner
relative_to
resolve
root
symlink_to
cloud_prefix
download_to
etag
is_valid_cloudpath
blob
bucket
container
key
md5

Icon made by srip from www.flaticon.com.
Sample code block generated using the reprexpy package.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloudpathlib-0.1.2.tar.gz (26.5 kB view details)

Uploaded Source

Built Distribution

cloudpathlib-0.1.2-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file cloudpathlib-0.1.2.tar.gz.

File metadata

  • Download URL: cloudpathlib-0.1.2.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for cloudpathlib-0.1.2.tar.gz
Algorithm Hash digest
SHA256 11fe9fb8bda5f11a0961998622e357e67812d2f6581ccd7a7e27a0921649b43f
MD5 a46d37a9ada634bae5e5ed85f6b61593
BLAKE2b-256 a03c029dabcc6fcb8c06dd57cee3a8a97b5f3db59dbfb976adc9f331367b5ef2

See more details on using hashes here.

File details

Details for the file cloudpathlib-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: cloudpathlib-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for cloudpathlib-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3097f32addf2565ff381b6cc28b3297d2697b1517eb13d03fa075fbd8741eaf2
MD5 df9b0388708fd785e0666eb1c1b5e597
BLAKE2b-256 981a9dcfd686836f461923909b29d42d0e0eb94599130353867d90f2725b3d5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page