Dataset generation and management service for the Juniper ecosystem
Project description
Juniper Data
Dataset generation and management service for the Juniper ecosystem.
Overview
Juniper Data provides a centralized service for generating, storing, and serving datasets used by the Juniper neural network projects. It supports various dataset types including the classic two-spiral classification problem.
Ecosystem Compatibility
This service is part of the Juniper ecosystem. Verified compatible versions:
| juniper-data | juniper-cascor | juniper-canopy | data-client | cascor-client | cascor-worker |
|---|---|---|---|---|---|
| 0.4.x | 0.3.x | 0.2.x | >=0.3.1 | >=0.1.0 | >=0.1.0 |
For full-stack Docker deployment and integration tests, see juniper-deploy.
Architecture
JuniperData is the foundational data layer of the Juniper ecosystem. JuniperCascor and juniper-canopy both call JuniperData to generate and retrieve datasets.
┌─────────────────────┐ REST+WS ┌──────────────────────┐
│ juniper-canopy │ ◄──────────────► │ JuniperCascor │
│ Dashboard │ │ Training Svc │
│ Port 8050 │ │ Port 8200 │
└──────────┬──────────┘ └──────────┬───────────┘
│ REST │ REST
▼ ▼
┌──────────────────────────────────────────────────────────────┐
│ JuniperData ◄── (this service) │
│ Dataset Service · Port 8100 │
└──────────────────────────────────────────────────────────────┘
Data contract: datasets are served as NPZ archives with keys X_train, y_train, X_test, y_test, X_full, y_full (all float32).
Related Services
| Service | Relationship | Environment Variable |
|---|---|---|
| juniper-cascor | Consumes JuniperData for training datasets | JUNIPER_DATA_URL=http://localhost:8100 |
| juniper-canopy | Consumes JuniperData for visualization data | JUNIPER_DATA_URL=http://localhost:8100 |
| juniper-data-client | PyPI client library for this service | pip install juniper-data-client |
Service Configuration
| Variable | Default | Description |
|---|---|---|
JUNIPER_DATA_HOST |
0.0.0.0 |
Listen address |
JUNIPER_DATA_PORT |
8100 |
Service port |
JUNIPER_DATA_LOG_LEVEL |
INFO |
Log verbosity |
Docker Deployment
# Full stack with all three services:
git clone https://github.com/pcalnon/juniper-deploy.git # (private repository)
cd juniper-deploy && docker compose up --build
Dependency Lockfile
The requirements.lock file pins exact dependency versions for reproducible Docker builds. The pyproject.toml retains flexible >= ranges for local development.
Regenerate after changing dependencies in pyproject.toml:
uv pip compile pyproject.toml --extra api --extra observability -o requirements.lock
Installation
Basic Installation
pip install -e .
With API Support
pip install -e ".[api]"
Development Installation
pip install -e ".[dev]"
Full Installation
pip install -e ".[all]"
Quick Start
Generate a Spiral Dataset
from juniper_data.generators.spiral import SpiralGenerator
generator = SpiralGenerator()
dataset = generator.generate(n_points=100, n_spirals=2, noise=0.1)
Start the API Server
uvicorn juniper_data.api.app:app --reload
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v1/health |
GET | Health check |
/v1/health/live |
GET | Liveness probe |
/v1/health/ready |
GET | Readiness probe (checks storage) |
/v1/generators |
GET | List all generators with schemas |
/v1/generators/{name}/schema |
GET | Get parameter schema for a generator |
/v1/datasets |
POST | Create dataset (or return cached dataset) |
/v1/datasets |
GET | List dataset IDs |
/v1/datasets/filter |
GET | Filter metadata by generator/tags/date/name/version |
/v1/datasets/stats |
GET | Aggregate dataset statistics |
/v1/datasets/versions |
GET | List all versions for a logical dataset name |
/v1/datasets/latest |
GET | Get latest version for a logical dataset name |
/v1/datasets/batch-create |
POST | Create multiple datasets |
/v1/datasets/batch-delete |
POST | Delete multiple datasets |
/v1/datasets/batch-tags |
PATCH | Update tags on multiple datasets |
/v1/datasets/batch-export |
POST | Export multiple datasets as ZIP |
/v1/datasets/cleanup-expired |
POST | Delete expired datasets |
/v1/datasets/{id} |
GET | Get dataset metadata |
/v1/datasets/{id} |
DELETE | Delete a dataset |
/v1/datasets/{id}/artifact |
GET | Download NPZ artifact |
/v1/datasets/{id}/preview |
GET | Preview first N samples as JSON |
/v1/datasets/{id}/tags |
PATCH | Add/remove tags on one dataset |
See docs/api/JUNIPER_DATA_API.md for full endpoint documentation including filtering, batch operations, and tagging.
Named Dataset Versioning
POST /v1/datasets supports logical names for versioned datasets:
- Set
nameto group related datasets into a version series. - Persisted creates with the same
nameauto-incrementmeta.dataset_version(1,2,3, ...). - Repeating an identical request returns the cached dataset and keeps its existing version.
- Use
GET /v1/datasets/versions?name=<dataset_name>to view history andGET /v1/datasets/latest?name=<dataset_name>to resolve the latest.
Project Structure
juniper-data/
├── juniper_data/
│ ├── core/ # Core functionality and base classes
│ ├── generators/ # Dataset generators (8 types)
│ │ ├── spiral/ # Multi-spiral classification
│ │ ├── xor/ # XOR classification
│ │ ├── gaussian/ # Mixture of Gaussians
│ │ ├── circles/ # Concentric circles
│ │ ├── checkerboard/ # 2D checkerboard pattern
│ │ ├── csv_import/ # CSV/JSON file import
│ │ ├── mnist/ # MNIST / Fashion-MNIST
│ │ └── arc_agi/ # ARC-AGI visual reasoning
│ ├── storage/ # Dataset persistence layer
│ ├── api/ # FastAPI application
│ │ └── routes/ # API route handlers
│ └── tests/ # Test suite
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
├── pyproject.toml # Project configuration
└── README.md # This file
Development
Running Tests
pytest
Running Tests with Coverage
pytest --cov=juniper_data --cov-report=html
Code Formatting
ruff format juniper_data tests
ruff check --fix juniper_data tests
Type Checking
mypy juniper_data
Juniper Ecosystem
| Repository | Description |
|---|---|
| juniper-data | Dataset generation service (this repo) |
| juniper-cascor | CasCor neural network training service |
| juniper-canopy | Real-time monitoring dashboard |
| juniper-data-client | PyPI: juniper-data-client |
| juniper-cascor-client | PyPI: juniper-cascor-client |
| juniper-cascor-worker | PyPI: juniper-cascor-worker |
License
MIT License - Copyright (c) 2024-2026 Paul Calnon
Git Leaks
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file juniper_data-0.6.0.tar.gz.
File metadata
- Download URL: juniper_data-0.6.0.tar.gz
- Upload date:
- Size: 135.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54afc8ebac9baba0b0d220f56dc82be34fe197ef6bc6acca5622b0e57cf3cf8a
|
|
| MD5 |
e8ba2c218c436ae8538348446986360d
|
|
| BLAKE2b-256 |
4814ce313c84dbe0e0ba884e45c9e705d74596c6af6929806a4fca56bc41854e
|
Provenance
The following attestation bundles were made for juniper_data-0.6.0.tar.gz:
Publisher:
publish.yml on pcalnon/juniper-data
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
juniper_data-0.6.0.tar.gz -
Subject digest:
54afc8ebac9baba0b0d220f56dc82be34fe197ef6bc6acca5622b0e57cf3cf8a - Sigstore transparency entry: 1263910060
- Sigstore integration time:
-
Permalink:
pcalnon/juniper-data@0d656301e4791d8267482102a9a782c1967cdf3e -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/pcalnon
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0d656301e4791d8267482102a9a782c1967cdf3e -
Trigger Event:
release
-
Statement type:
File details
Details for the file juniper_data-0.6.0-py3-none-any.whl.
File metadata
- Download URL: juniper_data-0.6.0-py3-none-any.whl
- Upload date:
- Size: 178.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac1ee5d05de9a78f69d0570acb00346ab63d08358dcf42b35f854ffb8acf4252
|
|
| MD5 |
7131a3f7d1473b8ee1870e1947c4bf5e
|
|
| BLAKE2b-256 |
dc83ab49f329be1c422eed459338e7432c8493d87ab483a0201b9eef607c04ae
|
Provenance
The following attestation bundles were made for juniper_data-0.6.0-py3-none-any.whl:
Publisher:
publish.yml on pcalnon/juniper-data
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
juniper_data-0.6.0-py3-none-any.whl -
Subject digest:
ac1ee5d05de9a78f69d0570acb00346ab63d08358dcf42b35f854ffb8acf4252 - Sigstore transparency entry: 1263910225
- Sigstore integration time:
-
Permalink:
pcalnon/juniper-data@0d656301e4791d8267482102a9a782c1967cdf3e -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/pcalnon
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0d656301e4791d8267482102a9a782c1967cdf3e -
Trigger Event:
release
-
Statement type: