Skip to main content

pathlib api extended to use fsspec backends

Project description

Universal Pathlib

PyPI PyPI - Python Version PyPI - License Conda (channel only)

Tests GitHub issues Codestyle black Changelog

Universal Pathlib is a python library that aims to extend Python's built-in pathlib.Path api to use a variety of backend filesystems using fsspec

Installation

Pypi

python -m pip install universal_pathlib

conda

conda install -c conda-forge universal_pathlib

Basic Usage

# pip install universal_pathlib s3fs
>>> from upath import UPath
>>>
>>> s3path = UPath("s3://test_bucket") / "example.txt"
>>> s3path.name
example.txt
>>> s3path.stem
example
>>> s3path.suffix
.txt
>>> s3path.exists()
True
>>> s3path.read_text()
'Hello World'

For more examples, see the example notebook here

Currently supported filesystems (and schemes)

  • file: Local filessystem
  • memory: Ephemeral filesystem in RAM
  • az:, adl:, abfs: and abfss: Azure Storage (requires adlfs to be installed)
  • http: and https: HTTP(S)-based filesystem
  • hdfs: Hadoop distributed filesystem
  • gs: and gcs: Google Cloud Storage (requires gcsfs to be installed)
  • s3: and s3a: AWS S3 (requires s3fs to be installed)
  • webdav+http: and webdav+https: WebDAV-based filesystem on top of HTTP(S) (requires webdav4[fsspec] to be installed)

Other fsspec-compatible filesystems may also work, but are not supported and tested. Contributions for new filesystems are welcome!

Class hierarchy

The individual UPath subclasses relate in the following way with pathlib classes:

flowchart TB
  subgraph s0[pathlib]
    A---> B
    A--> AP
    A--> AW

    B--> BP
    AP---> BP
    B--> BW
    AW---> BW
  end
  subgraph s1[upath]
    B ---> U
    U --> UP
    U --> UW
    BP --> UP
    BW --> UW
    U --> UL
    U --> US3
    U --> UH
    U -.-> UO
  end

  A(PurePath)
  AP(PurePosixPath)
  AW(PureWindowsPath)
  B(Path)
  BP(PosixPath)
  BW(WindowsPath)

  U(UPath)
  UP(PosixUPath)
  UW(WindowsUPath)
  UL(LocalPath)
  US3(S3Path)
  UH(HttpPath)
  UO(...Path)

  classDef np fill:#f7f7f7,stroke:#2166ac,stroke-width:2px,color:#333
  classDef nu fill:#f7f7f7,stroke:#b2182b,stroke-width:2px,color:#333

  class A,AP,AW,B,BP,BW,UP,UW np
  class U,UL,US3,UH,UO nu

  style UO stroke-dasharray: 3 3

  style s0 fill:none,stroke:#0571b0,stroke-width:3px,stroke-dasharray: 3 3,color:#0571b0
  style s1 fill:none,stroke:#ca0020,stroke-width:3px,stroke-dasharray: 3 3,color:#ca0020

When instantiating UPath the returned instance type depends on the path that was provided to the constructor. For "URI"-style paths, UPath returns a subclass instance corresponding to the supported fsppec protocol, defined by the URI-scheme. If there is no specialized subclass implementation available, UPath with return a UPath instance and raise a warning that the protocol is currently not being tested in the test-suite, and correct behavior is not guaranteed. If a local path is provided, UPath will return a PosixUPath or WindowsUPath instance. These two subclasses are 100% compatible with the PosixPath and WindowsPath classes of their specific Python version, and are tested against all relevant tests of the CPython pathlib test-suite.

UPath public class API

UPath's public class interface is identical to pathlib.Path with the addition of the following attributes:

  • UPath(...).protocol: str the filesystem_spec protocol (note: for PosixUPath and WindowsUPath it's an empty string)
  • UPath(...).storage_options: dict[str, Any] the storage options for instantiating the filesystem_spec class
  • UPath(...).path: str the filesystem_spec compatible path for use with filesystem instances
  • UPath(...).fs: AbstractFileSystem convenience attribute to access an instantiated filesystem

the first three provide a public interface to access a file via fsspec as follows:

from upath import UPath
from fsspec import filesystem

p = UPath("s3://bucket/file.txt", anon=True)

fs = filesystem(p.protocol, **p.storage_options)  # equivalent to p.fs
with fs.open(p.path) as f:
    data = f.read()

Register custom UPath implementations

In case you develop a custom UPath implementation, feel free to open an issue to discuss integrating it in universal_pathlib. You can dynamically register your implementation too! Here are your options:

Dynamic registration from Python

# for example: mymodule/submodule.py
from upath import UPath
from upath.registry import register_implementation

my_protocol = "myproto"
class MyPath(UPath):
    ...  # your custom implementation

register_implementation(my_protocol, MyPath)

Registration via entry points

# pyproject.toml
[project.entry-points."unversal_pathlib.implementations"]
myproto = "my_module.submodule:MyPath"
# setup.cfg
[options.entry_points]
universal_pathlib.implementations =
    myproto = my_module.submodule:MyPath

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, universal_pathlib is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

universal_pathlib-0.1.2.tar.gz (129.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

universal_pathlib-0.1.2-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file universal_pathlib-0.1.2.tar.gz.

File metadata

  • Download URL: universal_pathlib-0.1.2.tar.gz
  • Upload date:
  • Size: 129.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for universal_pathlib-0.1.2.tar.gz
Algorithm Hash digest
SHA256 689e2701fef5ed4ed19888d0aca0b3542fe248e09b10adc431f5f747f868052a
MD5 d2612c3e30b31eef193ef5af8fe812a0
BLAKE2b-256 53c829e38d377f54576658453d2364e10cd87e8d8a7d85b2c51a5f61e83b9d13

See more details on using hashes here.

File details

Details for the file universal_pathlib-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for universal_pathlib-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ab8f5da05b46669fda0bca0be1de13f0bc85107fc19039ae2646efb5fe9abe2a
MD5 a16dc86b625ebb9cfc8e9d3efdd8a713
BLAKE2b-256 d6f690fb4c59d8e824b0b0f8d37795f5ef5094cc6f7496d86d07602ed879471e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page