Skip to main content

Pelican Platform backed data loader

Project description

pelican-data-loader

Pelican-backed data loader prototype: demo

Quickstart

  1. Install pelican-data-loader and pytorch from pypi

    pip install pelican-data-loader torch
    
  2. Consume data with datasets

    from datasets import load_dataset
    dataset = load_dataset("csv", data_files="pelican://uwdf-director.chtc.wisc.edu/wisc.edu/dsi/pytorch/bird_migration_data.csv")
    torch_dataset = dataset.with_format("torch")
    

For more detailed example, see this notebook

Features

  • Uses Croissant to store / validate metadata
  • Uses pelicanfs to locate/cache dataset
  • Uses datasets to convert to different ML data format (e.g., pytorch, tensorflow, jax, polars, pyarrow...)
  • Provided dataset storage via UW-Madison's S3

Future features (Pending)

  • doi minting via DataCite
  • better frontend for dataset discover and publishing
  • backup
  • data prefetching? (at pelican layer?)
  • private datasets
  • telemetry?

Backend

  • WISC-S3, storing
    • Actual datasets
    • Croissant JSONLD
  • Postgres, storing
    • Various metadata
    • Links to pelican data source
    • Links to Croissant JSONLD

Dev notes

  • Licenses data: pull from SPDX with pelican_data_loader.data.pull_license.
  • minimal csv file croissant generator: pelican_data_loader.utils.parse_col.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uwdf-0.0.1.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uwdf-0.0.1-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file uwdf-0.0.1.tar.gz.

File metadata

  • Download URL: uwdf-0.0.1.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.12

File hashes

Hashes for uwdf-0.0.1.tar.gz
Algorithm Hash digest
SHA256 758b51c436f01011cb187c5d5867db91f3ec3bea003b5334fbba80f4afc4840a
MD5 d70ddf42dfcb9ae24c6ad0a119c11e30
BLAKE2b-256 28babf1b75e3fc7d05de70a03c6fbe81931350b12f08c6a91a06434d6b41490a

See more details on using hashes here.

File details

Details for the file uwdf-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: uwdf-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.12

File hashes

Hashes for uwdf-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 96af4d588c052065702c4a0772861ead69b91d903a9479c9275037846450f59a
MD5 c95b721d2560e5615fb7a886b54d9a74
BLAKE2b-256 ab37cb9682459b0fc63f536df6c9e7e473efc67543cf523d668bae6423e455b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page