An IPython widget for browsing, benchmarking, and processing S3 datasets.
Project description
Data Cockpit
Data Cockpit is an interactive IPython widget built on top of the Dataplug framework. It enables scientists and engineers to:
- Upload and browse datasets in Amazon S3
- Explore curated public and Metaspace collections
- Benchmark performance to discover optimal batch sizes
- Partition a variety of scientific data types into chunks or batches
- Integrate seamlessly into Jupyter notebooks for elastic, parallel workloads
Why Data Cockpit?
Built on Dataplug’s Cloud-Aware Partitioning
Dataplug is a client-side Python framework for dynamic, zero-cost data slicing of unstructured scientific data stored in object stores like S3. It:
- Pre-processes data in a read-only fashion, building lightweight indexes decoupled from the raw objects
- Exploits S3 byte-range reads to parallelize high-bandwidth access across many workers
- Supports a plug-in interface for multiple domains:
- Generic: CSV, raw text
- Genomics: FASTA, FASTQ, VCF
- Geospatial: LiDAR, Cloud-Optimized Point Cloud (COPC), COG
- Metabolomics: ImzML
- Allows re-partitioning with different strategies without rewriting the original data
What Data Cockpit Adds
While Dataplug focuses on efficient data slicing, Data Cockpit provides an end-to-end Jupyter UI that:
- Uploads your local files directly into any S3 bucket
- Browses existing buckets or public datasets from the AWS Open Data Registry
- Runs benchmarks across a configurable range of batch sizes to find the fastest throughput
- Processes & partitions your data with one click, displaying progress and results entirely in-notebook
- Retrieves partitions via
get_data_slices(), which returns the DataPlug data slices (metadata) for downstream processing
Installation
pip install cloud-data-cockpit
Or install both Data Cockpit and geospatial extras together:
pip install cloud-data-cockpit[geospatial]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cloud_data_cockpit-0.2.3.tar.gz.
File metadata
- Download URL: cloud_data_cockpit-0.2.3.tar.gz
- Upload date:
- Size: 67.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c302e85ce8e4ed10967a38ad33909bac2fafd81c410a58fbd405b63ee3833d2
|
|
| MD5 |
de8b81c3574c00264033c3c827ea2170
|
|
| BLAKE2b-256 |
edb92f4b719b5b1971fe121dae395ab7ab9fd9b79aa0e73415675682a5f2cdbb
|
File details
Details for the file cloud_data_cockpit-0.2.3-py3-none-any.whl.
File metadata
- Download URL: cloud_data_cockpit-0.2.3-py3-none-any.whl
- Upload date:
- Size: 73.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd4d27a0156a1495965f830e7337c793e6b8b9dfc4df2c0cdb19a8794cfdbc43
|
|
| MD5 |
1a3d7886fe6110fe728bf85ef1d4b233
|
|
| BLAKE2b-256 |
7f94697e9a25e169f07b24aea4c423e1d99544f71a61f5572b3b708de702247c
|