Simple data access layer over fsspec
Project description
Simple data catalog library for python
This project is a trivial attempt at offering basic catalog functionality for structured datasets stored in local or remote folders. The library uses universal_pathlib to access remote storage locations like S3, Google Cloud Storage, etc ... The library reads a config file called fsdata.ini which defines a list of collections, one per section. Each collection corresponds to a local or remote folder containing a collection of files, all with the same format and extension. Currently, as a prototype, the library supports only pandas dataframes saved as .parquet files. The library uses local caching to avoid fetching the same data multiple times.
Warning This project is for exploration only.
Configuration
The configuration file fsdata.ini has one section for each collection, with the section name for name and with a path key pointing to its location.
The config file should be located in the the standard XDG config directory XDG_CONFIG_HOME (or ~/.config).
# fsdata.ini
[samples]
path = s3://my-bucket/samples
[datasets]
path = s3://my-bucket/datasets
[testdata]
path = s3://my-bucket/testdata
Usage
To access a given collection just use the collection method.
import fsdata
samples = fsdata.collection("samples")
To list items in a collections
samples.items()
Please note that item names are bare names without extension.
To load data use the load method.
samples.load("my-sample")
To save data use the save method.
samples.save("my-sample", data)
You can also load on item directly with fsdata.load method
fsdata.load("samples", "my-sample")
Installation
You can install the package with pip
pip install fsdata
You can also specify any of the extra dependencies s3, gcs, adl
pip install "fsdata[s3]"
Requirements
- pandas
- pyarrow
- universal_pathlib
- fsspec backends like s3fs, etc ... as applicable
Related Projects and Resources
- intake - Lightweight package for finding, investigating, loading and disseminating data.
- quilt - Quilt is a data mesh for connecting people with actionable data
- pystore - Fast data store for Pandas time-series data
- pandas - Flexible and powerful data analysis / manipulation library for Python
- pyarrow - Universal columnar format and multi-language toolbox
- parquet - Apache Parquet Format
- fsspec - Filesystem interfaces for Python
- universal_pathlib - pathlib api extended to use fsspec backends
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fsdata-0.0.6-py3-none-any.whl.
File metadata
- Download URL: fsdata-0.0.6-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00577d55d0a06343c37223cd8eb9741336dfc33019606b175100131dd58747cc
|
|
| MD5 |
d5658f6f0503c691fdf6e63a526839ac
|
|
| BLAKE2b-256 |
4316caf9a5bb7e1a50ff3008151857a8c25d044651f6a3421bf4f948ff58812f
|