pathlib api extended to use fsspec backends
Project description
Universal Pathlib
Universal Pathlib is a python library that aims to extend Python's built-in pathlib.Path
api to use a variety of backend filesystems using fsspec
Installation
Pypi
python -m pip install universal_pathlib
conda
conda install -c conda-forge universal_pathlib
Basic Usage
# pip install universal_pathlib s3fs
>>> from upath import UPath
>>>
>>> s3path = UPath("s3://test_bucket") / "example.txt"
>>> s3path.name
example.txt
>>> s3path.stem
example
>>> s3path.suffix
.txt
>>> s3path.exists()
True
>>> s3path.read_text()
'Hello World'
For more examples, see the example notebook here
Currently supported filesystems (and schemes)
file:
Local filessystemmemory:
Ephemeral filesystem in RAMaz:
,adl:
,abfs:
andabfss:
Azure Storage (requiresadlfs
to be installed)http:
andhttps:
HTTP(S)-based filesystemhdfs:
Hadoop distributed filesystemgs:
andgcs:
Google Cloud Storage (requiresgcsfs
to be installed)s3:
ands3a:
AWS S3 (requiress3fs
to be installed)webdav+http:
andwebdav+https:
WebDAV-based filesystem on top of HTTP(S) (requireswebdav4[fsspec]
to be installed)
Other fsspec-compatible filesystems may also work, but are not supported and tested. Contributions for new filesystems are welcome!
Class hierarchy
The individual UPath
subclasses relate in the following way with pathlib
classes:
flowchart TB
subgraph s0[pathlib]
A---> B
A--> AP
A--> AW
B--> BP
AP---> BP
B--> BW
AW---> BW
end
subgraph s1[upath]
B ---> U
U --> UP
U --> UW
BP --> UP
BW --> UW
U --> UL
U --> US3
U --> UH
U -.-> UO
end
A(PurePath)
AP(PurePosixPath)
AW(PureWindowsPath)
B(Path)
BP(PosixPath)
BW(WindowsPath)
U(UPath)
UP(PosixUPath)
UW(WindowsUPath)
UL(LocalPath)
US3(S3Path)
UH(HttpPath)
UO(...Path)
classDef np fill:#f7f7f7,stroke:#2166ac,stroke-width:2px,color:#333
classDef nu fill:#f7f7f7,stroke:#b2182b,stroke-width:2px,color:#333
class A,AP,AW,B,BP,BW,UP,UW np
class U,UL,US3,UH,UO nu
style UO stroke-dasharray: 3 3
style s0 fill:none,stroke:#0571b0,stroke-width:3px,stroke-dasharray: 3 3,color:#0571b0
style s1 fill:none,stroke:#ca0020,stroke-width:3px,stroke-dasharray: 3 3,color:#ca0020
When instantiating UPath
the returned instance type depends on the path that was provided to the constructor.
For "URI"-style paths, UPath
returns a subclass instance corresponding to the supported fsppec
protocol, defined
by the URI-scheme. If there is no specialized subclass implementation available, UPath
with return a UPath
instance
and raise a warning that the protocol is currently not being tested in the test-suite, and correct behavior is not
guaranteed.
If a local path is provided, UPath
will return a PosixUPath
or WindowsUPath
instance.
These two subclasses are 100% compatible with the PosixPath
and WindowsPath
classes of their
specific Python version, and are tested against all relevant tests of the CPython pathlib test-suite.
UPath public class API
UPath
's public class interface is identical to pathlib.Path
with the addition of the following attributes:
UPath(...).protocol: str
the filesystem_spec protocol (note: forPosixUPath
andWindowsUPath
it's an empty string)UPath(...).storage_options: dict[str, Any]
the storage options for instantiating the filesystem_spec classUPath(...).path: str
the filesystem_spec compatible path for use with filesystem instancesUPath(...).fs: AbstractFileSystem
convenience attribute to access an instantiated filesystem
the first three provide a public interface to access a file via fsspec as follows:
from upath import UPath
from fsspec import filesystem
p = UPath("s3://bucket/file.txt", anon=True)
fs = filesystem(p.protocol, **p.storage_options) # equivalent to p.fs
with fs.open(p.path) as f:
data = f.read()
Register custom UPath implementations
In case you develop a custom UPath implementation, feel free to open an issue to discuss integrating it
in universal_pathlib
. You can dynamically register your implementation too! Here are your options:
Dynamic registration from Python
# for example: mymodule/submodule.py
from upath import UPath
from upath.registry import register_implementation
my_protocol = "myproto"
class MyPath(UPath):
... # your custom implementation
register_implementation(my_protocol, MyPath)
Registration via entry points
# pyproject.toml
[project.entry-points."unversal_pathlib.implementations"]
myproto = "my_module.submodule:MyPath"
# setup.cfg
[options.entry_points]
universal_pathlib.implementations =
myproto = my_module.submodule:MyPath
Known issues solvable by installing newer upstream dependencies
Some issues in UPath's behavior with specific filesystems can be fixed by installing newer versions of the dependencies. The following list will be kept up to date whenever we encounter more:
- UPath().glob() fsspec fixed its glob behavior when handling
**
patterns in versionsfsspec>=2023.9.0
- GCSPath().mkdir() a few mkdir quirks are solved by installing
gcsfs>=2022.7.1
- fsspec.filesystem(WebdavPath().protocol) the webdav protocol was added to fsspec in version
fsspec>=2022.5.0
Contributing
Contributions are very welcome. To learn more, see the Contributor Guide.
License
Distributed under the terms of the MIT license, universal_pathlib is free and open source software.
Issues
If you encounter any problems, please file an issue along with a detailed description.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for universal_pathlib-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f99186cf950bde1262de9a590bb019613ef84f9fabd9f276e8b019722201943a |
|
MD5 | d891ae58ead14898c3c0f42db76c5993 |
|
BLAKE2b-256 | ed6eb726049020b66bf69b1b695b75a3d585e28f444b7259fab9be193de97b2c |