Skip to main content

An IPython widget for browsing, benchmarking, and processing S3 datasets.

Project description

Data Cockpit

Data Cockpit is an interactive IPython widget built on top of the Dataplug framework. It enables scientists and engineers to:

  • Upload and browse datasets in Amazon S3
  • Explore curated public and Metaspace collections
  • Benchmark performance to discover optimal batch sizes
  • Partition a variety of scientific data types into chunks or batches
  • Integrate seamlessly into Jupyter notebooks for elastic, parallel workloads

Why Data Cockpit?

Built on Dataplug’s Cloud-Aware Partitioning

Dataplug is a client-side Python framework for dynamic, zero-cost data slicing of unstructured scientific data stored in object stores like S3. It:

  • Pre-processes data in a read-only fashion, building lightweight indexes decoupled from the raw objects
  • Exploits S3 byte-range reads to parallelize high-bandwidth access across many workers
  • Supports a plug-in interface for multiple domains:
    • Generic: CSV, raw text
    • Genomics: FASTA, FASTQ, VCF
    • Geospatial: LiDAR, Cloud-Optimized Point Cloud (COPC), COG
    • Metabolomics: ImzML
  • Allows re-partitioning with different strategies without rewriting the original data

What Data Cockpit Adds

While Dataplug focuses on efficient data slicing, Data Cockpit provides an end-to-end Jupyter UI that:

  1. Uploads your local files directly into any S3 bucket
  2. Browses existing buckets or public datasets from the AWS Open Data Registry
  3. Runs benchmarks across a configurable range of batch sizes to find the fastest throughput
  4. Processes & partitions your data with one click, displaying progress and results entirely in-notebook
  5. Retrieves partitions via get_data_slices(), which returns the DataPlug data slices (metadata) for downstream processing

Installation

pip install cloud-data-cockpit

Or install both Data Cockpit and geospatial extras together:

pip install cloud-data-cockpit[geospatial]  

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cloud_data_cockpit-0.1.8.tar.gz (67.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cloud_data_cockpit-0.1.8-py3-none-any.whl (73.4 kB view details)

Uploaded Python 3

File details

Details for the file cloud_data_cockpit-0.1.8.tar.gz.

File metadata

  • Download URL: cloud_data_cockpit-0.1.8.tar.gz
  • Upload date:
  • Size: 67.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.18

File hashes

Hashes for cloud_data_cockpit-0.1.8.tar.gz
Algorithm Hash digest
SHA256 4a5d395aaa0c436e657cc46e8330f0b7451134f9daf3cc65bc9a52b1a31880bd
MD5 6c4cbec7935bbc3fcb7a2bfaddc4b93b
BLAKE2b-256 d62629a3906ad12fd9f0a07d3e1cd0359eff869d5e79e14f25a16bddf1c5a601

See more details on using hashes here.

File details

Details for the file cloud_data_cockpit-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for cloud_data_cockpit-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 83181f8973a9e3e33a994b703ecd6b0098a91cf663158512a36ebcae8a561447
MD5 3b1f1095f60d187416d49a8cb2fd5c69
BLAKE2b-256 29ac3d8fc86f7bb250905907e39af1bb11a5f8be3d443b7239a6ecb02e63086e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page