Dataset generation and management service for the Juniper ecosystem

These details have not been verified by PyPI

Project description

Juniper Data

Dataset generation and management service for the Juniper ecosystem.

Overview

Juniper Data provides a centralized service for generating, storing, and serving datasets used by the Juniper neural network projects. It supports various dataset types including the classic two-spiral classification problem.

Ecosystem Compatibility

This service is part of the Juniper ecosystem. Verified compatible versions:

juniper-data	juniper-cascor	juniper-canopy	data-client	cascor-client	cascor-worker
0.4.x	0.3.x	0.2.x	>=0.3.1	>=0.1.0	>=0.1.0

For full-stack Docker deployment and integration tests, see juniper-deploy.

Architecture

JuniperData is the foundational data layer of the Juniper ecosystem. JuniperCascor and juniper-canopy both call JuniperData to generate and retrieve datasets.

┌─────────────────────┐     REST+WS      ┌──────────────────────┐
│   juniper-canopy     │ ◄──────────────► │    JuniperCascor     │
│   Dashboard         │                  │    Training Svc      │
│   Port 8050         │                  │    Port 8200         │
└──────────┬──────────┘                  └──────────┬───────────┘
           │ REST                                    │ REST
           ▼                                         ▼
┌──────────────────────────────────────────────────────────────┐
│                      JuniperData  ◄── (this service)          │
│                   Dataset Service  ·  Port 8100               │
└──────────────────────────────────────────────────────────────┘

Data contract: datasets are served as NPZ archives with keys X_train, y_train, X_test, y_test, X_full, y_full (all float32).

Related Services

Service	Relationship	Environment Variable
juniper-cascor	Consumes JuniperData for training datasets	`JUNIPER_DATA_URL=http://localhost:8100`
juniper-canopy	Consumes JuniperData for visualization data	`JUNIPER_DATA_URL=http://localhost:8100`
juniper-data-client	PyPI client library for this service	`pip install juniper-data-client`

Service Configuration

Variable	Default	Description
`JUNIPER_DATA_HOST`	`0.0.0.0`	Listen address
`JUNIPER_DATA_PORT`	`8100`	Service port
`JUNIPER_DATA_LOG_LEVEL`	`INFO`	Log verbosity

Docker Deployment

# Full stack with all three services:
git clone https://github.com/pcalnon/juniper-deploy.git  # (private repository)
cd juniper-deploy && docker compose up --build

Dependency Lockfile

The requirements.lock file pins exact dependency versions for reproducible Docker builds. The pyproject.toml retains flexible >= ranges for local development.

Regenerate after changing dependencies in pyproject.toml:

uv pip compile pyproject.toml --extra api --extra observability -o requirements.lock

Installation

Basic Installation

pip install -e .

With API Support

pip install -e ".[api]"

Development Installation

pip install -e ".[dev]"

Full Installation

pip install -e ".[all]"

Quick Start

Generate a Spiral Dataset

from juniper_data.generators.spiral import SpiralGenerator

generator = SpiralGenerator()
dataset = generator.generate(n_points=100, n_spirals=2, noise=0.1)

Start the API Server

uvicorn juniper_data.api.app:app --reload

API Endpoints

Endpoint	Method	Description
`/v1/health`	GET	Health check
`/v1/health/live`	GET	Liveness probe
`/v1/health/ready`	GET	Readiness probe (checks storage)
`/v1/generators`	GET	List all generators with schemas
`/v1/generators/{name}/schema`	GET	Get parameter schema for a generator
`/v1/datasets`	POST	Create dataset (or return cached dataset)
`/v1/datasets`	GET	List dataset IDs
`/v1/datasets/filter`	GET	Filter metadata by generator/tags/date/name/version
`/v1/datasets/stats`	GET	Aggregate dataset statistics
`/v1/datasets/versions`	GET	List all versions for a logical dataset name
`/v1/datasets/latest`	GET	Get latest version for a logical dataset name
`/v1/datasets/batch-create`	POST	Create multiple datasets
`/v1/datasets/batch-delete`	POST	Delete multiple datasets
`/v1/datasets/batch-tags`	PATCH	Update tags on multiple datasets
`/v1/datasets/batch-export`	POST	Export multiple datasets as ZIP
`/v1/datasets/cleanup-expired`	POST	Delete expired datasets
`/v1/datasets/{id}`	GET	Get dataset metadata
`/v1/datasets/{id}`	DELETE	Delete a dataset
`/v1/datasets/{id}/artifact`	GET	Download NPZ artifact
`/v1/datasets/{id}/preview`	GET	Preview first N samples as JSON
`/v1/datasets/{id}/tags`	PATCH	Add/remove tags on one dataset

See docs/api/JUNIPER_DATA_API.md for full endpoint documentation including filtering, batch operations, and tagging.

Named Dataset Versioning

POST /v1/datasets supports logical names for versioned datasets:

Set name to group related datasets into a version series.
Persisted creates with the same name auto-increment meta.dataset_version (1, 2, 3, ...).
Repeating an identical request returns the cached dataset and keeps its existing version.
Use GET /v1/datasets/versions?name=<dataset_name> to view history and GET /v1/datasets/latest?name=<dataset_name> to resolve the latest.

Project Structure

juniper-data/
├── juniper_data/
│   ├── core/           # Core functionality and base classes
│   ├── generators/     # Dataset generators (8 types)
│   │   ├── spiral/     # Multi-spiral classification
│   │   ├── xor/        # XOR classification
│   │   ├── gaussian/   # Mixture of Gaussians
│   │   ├── circles/    # Concentric circles
│   │   ├── checkerboard/ # 2D checkerboard pattern
│   │   ├── csv_import/ # CSV/JSON file import
│   │   ├── mnist/      # MNIST / Fashion-MNIST
│   │   └── arc_agi/    # ARC-AGI visual reasoning
│   ├── storage/        # Dataset persistence layer
│   ├── api/            # FastAPI application
│   │   └── routes/     # API route handlers
│   └── tests/          # Test suite
│       ├── unit/       # Unit tests
│       └── integration/ # Integration tests
├── pyproject.toml      # Project configuration
└── README.md           # This file

Development

Running Tests

pytest

Running Tests with Coverage

pytest --cov=juniper_data --cov-report=html

Code Formatting

ruff format juniper_data tests
ruff check --fix juniper_data tests

Type Checking

mypy juniper_data

Juniper Ecosystem

Repository	Description
juniper-data	Dataset generation service (this repo)
juniper-cascor	CasCor neural network training service
juniper-canopy	Real-time monitoring dashboard
juniper-data-client	PyPI: `juniper-data-client`
juniper-cascor-client	PyPI: `juniper-cascor-client`
juniper-cascor-worker	PyPI: `juniper-cascor-worker`

License

Git Leaks

gitleaks badge

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.6.0

Apr 9, 2026

0.4.2

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

juniper_data-0.6.0.tar.gz (135.4 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

juniper_data-0.6.0-py3-none-any.whl (178.4 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file juniper_data-0.6.0.tar.gz.

File metadata

Download URL: juniper_data-0.6.0.tar.gz
Upload date: Apr 9, 2026
Size: 135.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for juniper_data-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`54afc8ebac9baba0b0d220f56dc82be34fe197ef6bc6acca5622b0e57cf3cf8a`
MD5	`e8ba2c218c436ae8538348446986360d`
BLAKE2b-256	`4814ce313c84dbe0e0ba884e45c9e705d74596c6af6929806a4fca56bc41854e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for juniper_data-0.6.0.tar.gz:

Publisher: publish.yml on pcalnon/juniper-data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: juniper_data-0.6.0.tar.gz
- Subject digest: 54afc8ebac9baba0b0d220f56dc82be34fe197ef6bc6acca5622b0e57cf3cf8a
- Sigstore transparency entry: 1263910060
- Sigstore integration time: Apr 9, 2026
Source repository:
- Permalink: pcalnon/juniper-data@0d656301e4791d8267482102a9a782c1967cdf3e
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/pcalnon
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0d656301e4791d8267482102a9a782c1967cdf3e
- Trigger Event: release

File details

Details for the file juniper_data-0.6.0-py3-none-any.whl.

File metadata

Download URL: juniper_data-0.6.0-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 178.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for juniper_data-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ac1ee5d05de9a78f69d0570acb00346ab63d08358dcf42b35f854ffb8acf4252`
MD5	`7131a3f7d1473b8ee1870e1947c4bf5e`
BLAKE2b-256	`dc83ab49f329be1c422eed459338e7432c8493d87ab483a0201b9eef607c04ae`

See more details on using hashes here.

Provenance

The following attestation bundles were made for juniper_data-0.6.0-py3-none-any.whl:

Publisher: publish.yml on pcalnon/juniper-data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: juniper_data-0.6.0-py3-none-any.whl
- Subject digest: ac1ee5d05de9a78f69d0570acb00346ab63d08358dcf42b35f854ffb8acf4252
- Sigstore transparency entry: 1263910225
- Sigstore integration time: Apr 9, 2026
Source repository:
- Permalink: pcalnon/juniper-data@0d656301e4791d8267482102a9a782c1967cdf3e
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/pcalnon
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0d656301e4791d8267482102a9a782c1967cdf3e
- Trigger Event: release

juniper-data 0.6.0

Navigation

Verified details

Owner

Unverified details

Meta

Project description

Juniper Data

Overview

Ecosystem Compatibility

Architecture

Related Services

Service Configuration

Docker Deployment

Dependency Lockfile

Installation

Basic Installation

With API Support

Development Installation

Full Installation

Quick Start

Generate a Spiral Dataset

Start the API Server

API Endpoints

Named Dataset Versioning

Project Structure

Development

Running Tests

Running Tests with Coverage

Code Formatting

Type Checking

Juniper Ecosystem

License

Git Leaks

Project details

Verified details

Owner

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance