Skip to main content

Python bindings for the ATLAS array store

Project description

atlas-python

Python bindings for ATLAS (Aggregated Tensor Large Array Store) — a directory-based store for many similarly-shaped N-dimensional arrays, backed by local files or any object store (S3 / GCS / Azure / HTTP). Built on a Rust core with a synchronous, NumPy-native API and first-class xarray integration.

pip install atlas-python
import atlas
Extra Install Adds
cloud pip install "atlas-python[cloud]" S3 / GCS / Azure / HTTP backends via obstore

numpy, xarray, and dask are installed automatically.

Quick start

import numpy as np
import atlas

# The `with` block flushes (== close) on exit. Nothing is persisted before that.
with atlas.Atlas.create("/tmp/my_store", codec="zstd") as store:   # "zstd" | "lz4" | "none"
    ds = store.create_dataset("jan_2024")
    ds.define_array(
        "temperature",
        dtype="float32",
        dims=["lat", "lon"],
        shape=[8, 16],
        chunk_shape=[4, 8],
        fill_value=float("nan"),   # unwritten cells read back as NaN; NaN cells count as nulls in stats
    )
    ds.write_array("temperature", start=[0, 0], data=np.full((8, 16), 20.0, dtype=np.float32))
    ds.set_attribute("month", 1)
    ds.set_attribute("station", "KNMI")

# Reopen and read
store = atlas.Atlas.open("/tmp/my_store")
ds = store.open_dataset("jan_2024")
arr = ds.read_array("temperature")                    # full read -> np.ndarray
chunk = ds.read_array("temperature", [0, 0], [4, 8])  # partial read
stats = ds.array_stats("temperature")                 # {"row_count", "null_count", "min", "max"}
month = ds.get_attribute("month")                     # 1

Durability model

This is the one concept to internalise: writes are buffered in memory and only hit disk on flush().

The store's metadata is loaded once on open/create. Every subsequent mutation — creating datasets, defining arrays, write_array, set_attribute — updates in-memory state only. Nothing reaches disk until store.flush() (equivalently store.close(), or the with store: block exiting). Dropping an Atlas without flushing abandons every pending write.

The payoff: N consecutive writes amortise to a single flush — one delta file per touched array name and one metadata rewrite, no matter how many datasets you touched.

store = atlas.Atlas.create("/tmp/my_store")
# ... many create_dataset / write_array calls ...
store.flush()   # the single durability boundary

xarray integration

Importing atlas registers an accessor at xr.Dataset.atlas, so the integration is always available. The store must exist first; you then append xarray datasets to it.

import numpy as np, xarray as xr, atlas

ds = xr.Dataset(
    data_vars={
        "temperature": (["lat", "lon"], np.arange(8 * 16, dtype=np.float32).reshape(8, 16),
                        {"units": "C", "long_name": "surface temperature"}),
    },
    coords={"lat": np.arange(8, dtype=np.float32), "lon": np.arange(16, dtype=np.float32)},
    attrs={"month": 1, "station": "KNMI"},
)

with atlas.Atlas.create("/tmp/my_store") as store:
    store.add_xr_dataset(ds, "jan_2024")     # store-side method
    ds.atlas.write(store, "jan_2025")        # xarray accessor (same effect)

# Read back as an xr.Dataset
store = atlas.Atlas.open("/tmp/my_store")
ds_back = store.to_xarray("jan_2024")
xr.testing.assert_identical(ds, ds_back)

Bulk ingestion

add_xr_dataset never flushes by itself — N consecutive calls accumulate in memory and a single flush() (or the with exit) persists everything.

import glob, os, atlas, xarray as xr

with atlas.Atlas.create("/tmp/store") as store:
    for nc_path in sorted(glob.glob("*.nc")):
        name = os.path.splitext(os.path.basename(nc_path))[0]
        store.add_xr_dataset(xr.open_dataset(nc_path), name)
# One delta file per array name across the whole batch (not one per file).

Streaming dask-backed writes

If a variable's .data is a dask.array.Array (e.g. from xr.open_dataset(path, chunks=...) or ds.chunk({...})), add_xr_dataset / ds.atlas.write stream one dask block at a time into the store rather than materialising the whole array. The dask chunk shape becomes the on-disk chunk_shape, so the layout maps 1:1. Peak memory ≈ one chunk per variable.

ds = xr.open_dataset("big.nc", chunks={"time": 100, "lat": -1, "lon": -1})
with atlas.Atlas.create("/tmp/store") as store:
    store.add_xr_dataset(ds, "big")     # streams chunk-by-chunk

Pass chunks={var: [...]} to add_xr_dataset / ds.atlas.write to override the on-disk chunk shape independently of dask's chunking.

Lazy dask-backed reads

store.to_xarray(name) returns each variable dask-backed whenever it was stored with non-trivial chunking (chunk_shape != shape); the dask chunks tuple mirrors the on-disk chunk grid and each on-disk chunk is one dask task. Full-shape arrays (and 0-D scalars) come back eager as numpy. Call .compute() to materialise, or slice / map_blocks to operate lazily.

ds_back = atlas.Atlas.open("/tmp/store").to_xarray("big")
ds_back["temperature"].data              # -> dask.array.Array
ds_back["temperature"][0:100].compute()  # reads exactly one chunk

Reads run under dask's threaded scheduler only — the DatasetView captured in the graph isn't picklable, so call .compute() before handing off to distributed/multiprocessing schedulers.

How xarray maps onto the store

Item How it's stored
Each coord / data variable A separate array, with dims mapped 1:1.
Dataset attrs Dataset attributes, plain keys.
Per-variable attrs Flattened as {var}.{attr} at the dataset attr level.
Per-variable _FillValue Consumed by define_array as a typed fill value (source Dataset.attrs is not mutated).
Coord vs data_var distinction JSON list in the internal _pyatlas_coords attr.
Non-scalar attr values (list, ndarray) JSON-encoded string with a json: prefix marker.

Each add_xr_dataset / ds.atlas.write creates a new dataset — there is no append-into-existing mode.

Supported dtypes

numpy dtype atlas dtype
int8/16/32/64, uint8/16/32/64, float32/64 matching numeric
datetime64[ns] timestamp_nanoseconds (aliases: timestamp_ns, datetime64[ns])
object (str/bytes), |S<n>, |U<n> string (variable-length; reads return Python str)
  • 0-D scalar arrays (shape=[]) are supported for every dtype above.
  • bool is available as an attribute type but not as an array dtype.
  • binary, list[...], fixed_size_list[...,N] are reserved for a later release.

Cloud / object storage

With the cloud extra, Atlas.open / Atlas.create accept an obstore-constructed S3 / GCS / Azure / HTTP store handle instead of a local path. The path-based local-filesystem API works without it. See the cloud storage guide.

API reference

atlas.Atlas

Method Description
Atlas.create(path, codec="zstd") Create a new store at path.
Atlas.open(path) Open an existing store.
create_dataset(name) -> DatasetView New dataset (in-memory until flush).
open_dataset(name) -> DatasetView Existing dataset.
delete_dataset(name) Remove a dataset (persisted on next flush).
list_datasets() -> list[str] All dataset names.
list_arrays() -> list[str] Distinct array names across datasets.
dataset_exists(name) -> bool Existence check.
add_xr_dataset(ds, name, chunks=None) Append an xarray.Dataset (does not flush).
to_xarray(name) -> xr.Dataset Read a dataset back (chunked vars come back dask-backed).
flush() The single durability boundary — persist everything.
close() Alias for flush(); also the with-block exit.
compact() Reclaim tombstoned space across cached array files.
__enter__ / __exit__ Context-manager support (__exit__ calls close()).

atlas.DatasetView

Method Description
name (property) Dataset name.
list_arrays() -> list[str] Array names in this dataset.
define_array(name, dtype, dims, shape, chunk_shape=None, fill_value=None) Declare a new array. fill_value is a Python scalar matching the dtype; unwritten cells read back as it, and written cells equal to it count as nulls in array_stats. Dtype is enforced (TypeError on mismatch, OverflowError for out-of-range ints).
write_array(name, start, data) Write a numpy ndarray (matching the stored dtype).
read_array(name, start=None, shape=None) -> np.ndarray | None Read full or partial; None if the array isn't in this dataset.
delete_array(name) Tombstone the array within this dataset.
array_meta(name) -> dict | None {"dtype", "shape", "chunk_shape", "dimension_names"}.
array_stats(name) -> dict | None {"row_count", "null_count", "min", "max"} — populated after flush().
set_attribute(key, value, dtype=None) Type inferred from the Python value; pass dtype to override (e.g. "int8", "float32", "timestamp_nanoseconds"). On disk: bool, int64, float64, string, timestamp_nanoseconds.
get_attribute(key) / attributes() Single attribute or dict of all.

DatasetView does not expose its own flush / compact — both go through the parent Atlas.

Examples

Runnable, self-contained scripts (each writes to a temp directory):

  • 01_basics.py — create a store, define arrays, set attributes, reopen, read back.
  • 02_xarray.py — round-trip an xr.Dataset via both store.add_xr_dataset(...) and the ds.atlas.write(...) accessor.
  • 03_dask_streaming.py — stream a dask-chunked xr.Dataset in one chunk at a time.

Performance

ATLAS is tuned for collections of many similarly-shaped datasets. On a "1000 datasets" benchmark against netCDF4 and Zarr v3, the bulk read paths (Atlas.to_xarray_many / Atlas.read_array_across_stacked) beat Zarr by ~2.8× on large chunked slice reads, and on small per-dataset workloads ATLAS leads on both reads and writes. See the benchmarks for the full methodology, numbers, and an API picker for the fastest read path per workload.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atlas_python-0.9.1.tar.gz (188.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

atlas_python-0.9.1-cp310-abi3-win_amd64.whl (6.6 MB view details)

Uploaded CPython 3.10+Windows x86-64

atlas_python-0.9.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

atlas_python-0.9.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.2 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

atlas_python-0.9.1-cp310-abi3-macosx_11_0_arm64.whl (6.5 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

atlas_python-0.9.1-cp310-abi3-macosx_10_12_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file atlas_python-0.9.1.tar.gz.

File metadata

  • Download URL: atlas_python-0.9.1.tar.gz
  • Upload date:
  • Size: 188.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for atlas_python-0.9.1.tar.gz
Algorithm Hash digest
SHA256 e93cea48673db3468911f7a67d5b253d7c14d6ea72a8c78b62423a668ab0dc2f
MD5 d6fd78e2659974890d95acc69a36714e
BLAKE2b-256 1b17597475b41dd206f38b420bf54491c9ede32cf71c1f90a953069dc1fc7488

See more details on using hashes here.

Provenance

The following attestation bundles were made for atlas_python-0.9.1.tar.gz:

Publisher: atlas-python-release.yaml on maris-development/atlas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file atlas_python-0.9.1-cp310-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for atlas_python-0.9.1-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 90b6dda044c9d71f5a85cd49b23a8ac14bc45230a119892c65a961b32ca02abe
MD5 c6d45b6c3bd48d0c9c90dfa9b8f01a67
BLAKE2b-256 67d07225e59017a48c394f38fd8806549281160039c5ac4d5f556d43103ea5bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for atlas_python-0.9.1-cp310-abi3-win_amd64.whl:

Publisher: atlas-python-release.yaml on maris-development/atlas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file atlas_python-0.9.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for atlas_python-0.9.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2df8c06d362191a369f9ba07d54b2b93206488f0902140577e10e260da602e83
MD5 1569de9f52ccccdbc86f9348933a6a60
BLAKE2b-256 d2eeab0642a05670b0b3a89e840191279470edebb1f81fdd2b0bf890b0ebb9df

See more details on using hashes here.

Provenance

The following attestation bundles were made for atlas_python-0.9.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: atlas-python-release.yaml on maris-development/atlas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file atlas_python-0.9.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for atlas_python-0.9.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0ee92b9b16d4791a58336fd91f44b717bc29539414682f6591886e7fb922a4c3
MD5 bbe478f0406701cc15f709f8f12104a4
BLAKE2b-256 6bca17ab53fbfc65735848c591f589e61652ee16a02ca9e9078c8df31b949482

See more details on using hashes here.

Provenance

The following attestation bundles were made for atlas_python-0.9.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: atlas-python-release.yaml on maris-development/atlas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file atlas_python-0.9.1-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for atlas_python-0.9.1-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 03cadd9545d4bdafe6aceea8cecef950b40a2494dfb75b9dadb8604b6ef54059
MD5 fd7c55cb7639d3c5989827b58ad1eacf
BLAKE2b-256 72be45996097cb172ce63782b16290a4d74c80338b101f710ace1313a38c4f59

See more details on using hashes here.

Provenance

The following attestation bundles were made for atlas_python-0.9.1-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: atlas-python-release.yaml on maris-development/atlas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file atlas_python-0.9.1-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for atlas_python-0.9.1-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 9499ea81ad3596e6f7608956cdd2b50ec5073793ad01ddc5c7ece954a2018afc
MD5 8ff71013b4e70a83b9227e529047876f
BLAKE2b-256 f992959e320b56d8d6f4e8dcaac135eb6c9140c91d30236dd44ce8a6b9758811

See more details on using hashes here.

Provenance

The following attestation bundles were made for atlas_python-0.9.1-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: atlas-python-release.yaml on maris-development/atlas

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page