Skip to main content

Simple data access layer over fsspec

Project description

Simple data catalog library for python

This project is a trivial attempt at offering basic catalog functionality for structured datasets stored in local or remote folders. The library uses universal_pathlib to access remote storage locations like S3, Google Cloud Storage, etc ... The library reads a config file called fsdata.ini which defines a list of collections, one per section. Each collection corresponds to a local or remote folder containing a collection of files, all with the same format and extension. Currently, as a prototype, the library supports only pandas dataframes saved as .parquet files. The library uses local caching to avoid fetching the same data multiple times.

Warning This project is for exploration only.

Configuration

The configuration file fsdata.ini has one section for each collection, with the section name for name and with a path key pointing to its location. The config file should be located in the the standard XDG config directory XDG_CONFIG_HOME (or ~/.config).

# fsdata.ini
[samples]
path = s3://my-bucket/samples

[datasets]
path = s3://my-bucket/datasets

[testdata]
path = s3://my-bucket/testdata

Usage

To access a given collection just use the collection method.

import fsdata

samples = fsdata.collection("samples")

To list items in a collections

samples.items()

Please note that item names are bare names without extension.

To load data use the load method.

samples.load("my-sample")

To save data use the save method.

samples.save("my-sample", data)

You can also load on item directly with fsdata.load method

fsdata.load("samples", "my-sample")

Installation

You can install the package with pip

pip install fsdata

You can also specify any of the extra dependencies s3, gcs, adl

pip install "fsdata[s3]"

Requirements

  • pandas
  • pyarrow
  • universal_pathlib
  • fsspec backends like s3fs, etc ... as applicable

Related Projects and Resources

  • intake - Lightweight package for finding, investigating, loading and disseminating data.
  • quilt - Quilt is a data mesh for connecting people with actionable data
  • pystore - Fast data store for Pandas time-series data
  • pandas - Flexible and powerful data analysis / manipulation library for Python
  • pyarrow - Universal columnar format and multi-language toolbox
  • parquet - Apache Parquet Format
  • fsspec - Filesystem interfaces for Python
  • universal_pathlib - pathlib api extended to use fsspec backends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fsdata-0.0.6-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file fsdata-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: fsdata-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fsdata-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 00577d55d0a06343c37223cd8eb9741336dfc33019606b175100131dd58747cc
MD5 d5658f6f0503c691fdf6e63a526839ac
BLAKE2b-256 4316caf9a5bb7e1a50ff3008151857a8c25d044651f6a3421bf4f948ff58812f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page