Skip to main content

The official Python client library for Nucleus, the Data Platform for AI

Project description

Nucleus

https://dashboard.scale.com/nucleus

Aggregate metrics in ML are not good enough. To improve production ML, you need to understand their qualitative failure modes, fix them by gathering more data, and curate diverse scenarios.

Scale Nucleus helps you:

  • Visualize your data
  • Curate interesting slices within your dataset
  • Review and manage annotations
  • Measure and debug your model performance

Nucleus is a new way—the right way—to develop ML models, helping us move away from the concept of one dataset and towards a paradigm of collections of scenarios.

Installation

$ pip install scale-nucleus

CLI installation

We recommend installing the CLI via pipx (https://pypa.github.io/pipx/installation/). This makes sure that the CLI does not interfere with you system packages and is accessible from your favorite terminal.

For MacOS:

brew install pipx
pipx ensurepath
pipx install scale-nucleus
# Optional installation of shell completion (for bash, zsh or fish)
nu install-completions

Otherwise, install via pip (requires pip 19.0 or later):

python3 -m pip install --user pipx
python3 -m pipx ensurepath
python3 -m pipx install scale-nucleus
# Optional installation of shell completion (for bash, zsh or fish)
nu install-completions

Common issues/FAQ

Outdated Client

Nucleus is iterating rapidly and as a result we do not always perfectly preserve backwards compatibility with older versions of the client. If you run into any unexpected error, it's a good idea to upgrade your version of the client by running

pip install --upgrade scale-nucleus

Usage

For the most up to date documentation, reference: https://dashboard.scale.com/nucleus/docs/api?language=python.

For Developers

Clone from github and install as editable

git clone git@github.com:scaleapi/nucleus-python-client.git
cd nucleus-python-client
pip3 install poetry
poetry install

Please install the pre-commit hooks by running the following command:

poetry run pre-commit install

When releasing a new version please add release notes to the changelog in CHANGELOG.md.

Best practices for testing: (1). Please run pytest from the root directory of the repo, i.e.

poetry run pytest tests/test_dataset.py

(2) To skip slow integration tests that have to wait for an async job to start.

poetry run pytest -n auto -m "not integration"

Note: "-n auto" is used for pytest-xdist parallelization

Pydantic Models

Prefer using Pydantic models rather than creating raw dictionaries or dataclasses to send or receive over the wire as JSONs. Pydantic is created with data validation in mind and provides very clear error messages when it encounters a problem with the payload.

The Pydantic model(s) should mirror the payload to send. To represent a JSON payload that looks like this:

{
  "example_json_with_info": {
    "metadata": {
      "frame": 0
    },
    "reference_id": "frame0",
    "url": "s3://example/scale_nucleus/2021/lidar/0038711321865000.json",
    "type": "pointcloud"
  },
  "example_image_with_info": {
    "metadata": {
      "author": "Picasso"
    },
    "reference_id": "frame0",
    "url": "s3://bucket/0038711321865000.jpg",
    "type": "image"
  }
}

Could be represented as the following structure. Note that the field names map to the JSON keys and the usage of field validators (@validator).

import os.path
from pydantic import BaseModel, validator
from typing import Literal


class JsonWithInfo(BaseModel):
    metadata: dict  # any dict is valid
    reference_id: str
    url: str
    type: Literal["pointcloud", "recipe"]

    @validator("url")
    def has_json_extension(cls, v):
        if not v.endswith(".json"):
            raise ValueError(f"Expected '.json' extension got {v}")
        return v


class ImageWithInfo(BaseModel):
    metadata: dict  # any dict is valid
    reference_id: str
    url: str
    type: Literal["image", "mask"]

    @validator("url")
    def has_valid_extension(cls, v):
        valid_extensions = {".jpg", ".jpeg", ".png", ".tiff"}
        _, extension = os.path.splitext(v)
        if extension not in valid_extensions:
            raise ValueError(f"Expected extension in {valid_extensions} got {v}")
        return v


class ExampleNestedModel(BaseModel):
    example_json_with_info: JsonWithInfo
    example_image_with_info: ImageWithInfo

# Usage:
import requests
payload = requests.get("/example")
parsed_model = ExampleNestedModel.parse_obj(payload.json())
requests.post("example/post_to", json=parsed_model.dict())

Migrating to Pydantic

  • When migrating an interface from a dictionary use nucleus.pydantic_base.DictCompatibleModel. That allows you to get the benefits of Pydantic but maintaints backwards compatibility with a Python dictionary by delegating __getitem__ to fields.
  • When migrating a frozen dataclass use nucleus.pydantic_base.ImmutableModel. That is a base class set up to be immutable after initialization.

Updating documentation: We use Sphinx to autogenerate our API Reference from docstrings.

To test your local docstring changes, run the following commands from the repository's root directory:

poetry shell
cd docs
sphinx-autobuild . ./_build/html --watch ../nucleus

sphinx-autobuild will spin up a server on localhost (port 8000 by default) that will watch for and automatically rebuild a version of the API reference based on your local docstring changes.

Custom Metrics using Shapely in scale-validate

Certain metrics use Shapely and rasterio which is added as optional dependencies.

pip install scale-nucleus[metrics]

Note that you might need to install a local GEOS package since Shapely doesn't provide binaries bundled with GEOS for every platform.

#Mac OS
brew install geos
# Ubuntu/Debian flavors
apt-get install libgeos-dev

To develop it locally use

poetry install --extras metrics

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scale_nucleus-0.17.14.tar.gz (146.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scale_nucleus-0.17.14-py3-none-any.whl (178.7 kB view details)

Uploaded Python 3

File details

Details for the file scale_nucleus-0.17.14.tar.gz.

File metadata

  • Download URL: scale_nucleus-0.17.14.tar.gz
  • Upload date:
  • Size: 146.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.7.17 Linux/6.17.0-1007-aws

File hashes

Hashes for scale_nucleus-0.17.14.tar.gz
Algorithm Hash digest
SHA256 fecf2e06faf8ab47b80d4fc2cd801a06f8de3eee01f4cc8d3f4ad97a9c6ef60d
MD5 783e65c319c764d4c73c7273129e93a3
BLAKE2b-256 d3f00f222e58148d8bb593aee4396aac9f08ec4488d84eb73dfe7a3e9c1590de

See more details on using hashes here.

File details

Details for the file scale_nucleus-0.17.14-py3-none-any.whl.

File metadata

  • Download URL: scale_nucleus-0.17.14-py3-none-any.whl
  • Upload date:
  • Size: 178.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.7.17 Linux/6.17.0-1007-aws

File hashes

Hashes for scale_nucleus-0.17.14-py3-none-any.whl
Algorithm Hash digest
SHA256 4d15d8c1a3a587ca15d5f5dae633ff75c8a36b42d8ebee9a986b5f6ba5a6053f
MD5 2276c09536675852e2a2a143c03140f3
BLAKE2b-256 1bfa61ec552f4ed766a7c1ef9ecaf3fdd9b1ad52a5d07006c47add4b8d133b11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page