Pelican Platform backed data loader
Project description
pelican-data-loader
Pelican-backed data loader prototype: demo
Quickstart
-
Install
pelican-data-loaderandpytorchfrom pypipip install pelican-data-loader torch
-
Consume data with
datasetsfrom datasets import load_dataset dataset = load_dataset("csv", data_files="pelican://uwdf-director.chtc.wisc.edu/wisc.edu/dsi/pytorch/bird_migration_data.csv") torch_dataset = dataset.with_format("torch")
For more detailed example, see this notebook
Features
- Uses
Croissantto store / validate metadata - Uses
pelicanfsto locate/cache dataset - Uses
datasetsto convert to different ML data format (e.g., pytorch, tensorflow, jax, polars, pyarrow...) - Provided dataset storage via UW-Madison's S3
Future features (Pending)
doiminting via DataCite- better frontend for dataset discover and publishing
- backup
- data prefetching? (at pelican layer?)
- private datasets
- telemetry?
Backend
- WISC-S3, storing
- Actual datasets
- Croissant JSONLD
- Postgres, storing
- Various metadata
- Links to pelican data source
- Links to Croissant JSONLD
Dev notes
- Licenses data: pull from SPDX with
pelican_data_loader.data.pull_license. - minimal csv file croissant generator:
pelican_data_loader.utils.parse_col.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
uwdf-0.0.1.tar.gz
(22.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
uwdf-0.0.1-py3-none-any.whl
(24.8 kB
view details)
File details
Details for the file uwdf-0.0.1.tar.gz.
File metadata
- Download URL: uwdf-0.0.1.tar.gz
- Upload date:
- Size: 22.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
758b51c436f01011cb187c5d5867db91f3ec3bea003b5334fbba80f4afc4840a
|
|
| MD5 |
d70ddf42dfcb9ae24c6ad0a119c11e30
|
|
| BLAKE2b-256 |
28babf1b75e3fc7d05de70a03c6fbe81931350b12f08c6a91a06434d6b41490a
|
File details
Details for the file uwdf-0.0.1-py3-none-any.whl.
File metadata
- Download URL: uwdf-0.0.1-py3-none-any.whl
- Upload date:
- Size: 24.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96af4d588c052065702c4a0772861ead69b91d903a9479c9275037846450f59a
|
|
| MD5 |
c95b721d2560e5615fb7a886b54d9a74
|
|
| BLAKE2b-256 |
ab37cb9682459b0fc63f536df6c9e7e473efc67543cf523d668bae6423e455b1
|