Python bindings for the ATLAS array store
Project description
atlas-python
Python bindings for ATLAS (Aggregated Tensor Large Array Store) — a directory-based store for many similarly-shaped N-dimensional arrays, backed by local files or any object store (S3 / GCS / Azure / HTTP). Built on a Rust core with a synchronous, NumPy-native API and first-class xarray integration.
pip install atlas-python
import atlas
| Extra | Install | Adds |
|---|---|---|
| cloud | pip install "atlas-python[cloud]" |
S3 / GCS / Azure / HTTP backends via obstore |
numpy, xarray, and dask are installed automatically.
Quick start
import numpy as np
import atlas
# The `with` block flushes (== close) on exit. Nothing is persisted before that.
with atlas.Atlas.create("/tmp/my_store", codec="zstd") as store: # "zstd" | "lz4" | "none"
ds = store.create_dataset("jan_2024")
ds.define_array(
"temperature",
dtype="float32",
dims=["lat", "lon"],
shape=[8, 16],
chunk_shape=[4, 8],
fill_value=float("nan"), # unwritten cells read back as NaN; NaN cells count as nulls in stats
)
ds.write_array("temperature", start=[0, 0], data=np.full((8, 16), 20.0, dtype=np.float32))
ds.set_attribute("month", 1)
ds.set_attribute("station", "KNMI")
# Reopen and read
store = atlas.Atlas.open("/tmp/my_store")
ds = store.open_dataset("jan_2024")
arr = ds.read_array("temperature") # full read -> np.ndarray
chunk = ds.read_array("temperature", [0, 0], [4, 8]) # partial read
stats = ds.array_stats("temperature") # {"row_count", "null_count", "min", "max"}
month = ds.get_attribute("month") # 1
Durability model
This is the one concept to internalise: writes are buffered in memory and only hit disk on
flush().
The store's metadata is loaded once on open/create. Every subsequent mutation — creating
datasets, defining arrays, write_array, set_attribute — updates in-memory state only.
Nothing reaches disk until store.flush() (equivalently store.close(), or the with store:
block exiting). Dropping an Atlas without flushing abandons every pending write.
The payoff: N consecutive writes amortise to a single flush — one delta file per touched array name and one metadata rewrite, no matter how many datasets you touched.
store = atlas.Atlas.create("/tmp/my_store")
# ... many create_dataset / write_array calls ...
store.flush() # the single durability boundary
xarray integration
Importing atlas registers an accessor at xr.Dataset.atlas, so the integration is always
available. The store must exist first; you then append xarray datasets to it.
import numpy as np, xarray as xr, atlas
ds = xr.Dataset(
data_vars={
"temperature": (["lat", "lon"], np.arange(8 * 16, dtype=np.float32).reshape(8, 16),
{"units": "C", "long_name": "surface temperature"}),
},
coords={"lat": np.arange(8, dtype=np.float32), "lon": np.arange(16, dtype=np.float32)},
attrs={"month": 1, "station": "KNMI"},
)
with atlas.Atlas.create("/tmp/my_store") as store:
store.add_xr_dataset(ds, "jan_2024") # store-side method
ds.atlas.write(store, "jan_2025") # xarray accessor (same effect)
# Read back as an xr.Dataset
store = atlas.Atlas.open("/tmp/my_store")
ds_back = store.to_xarray("jan_2024")
xr.testing.assert_identical(ds, ds_back)
Bulk ingestion
add_xr_dataset never flushes by itself — N consecutive calls accumulate in memory and a single
flush() (or the with exit) persists everything.
import glob, os, atlas, xarray as xr
with atlas.Atlas.create("/tmp/store") as store:
for nc_path in sorted(glob.glob("*.nc")):
name = os.path.splitext(os.path.basename(nc_path))[0]
store.add_xr_dataset(xr.open_dataset(nc_path), name)
# One delta file per array name across the whole batch (not one per file).
Streaming dask-backed writes
If a variable's .data is a dask.array.Array (e.g. from xr.open_dataset(path, chunks=...)
or ds.chunk({...})), add_xr_dataset / ds.atlas.write stream one dask block at a time
into the store rather than materialising the whole array. The dask chunk shape becomes the
on-disk chunk_shape, so the layout maps 1:1. Peak memory ≈ one chunk per variable.
ds = xr.open_dataset("big.nc", chunks={"time": 100, "lat": -1, "lon": -1})
with atlas.Atlas.create("/tmp/store") as store:
store.add_xr_dataset(ds, "big") # streams chunk-by-chunk
Pass chunks={var: [...]} to add_xr_dataset / ds.atlas.write to override the on-disk chunk
shape independently of dask's chunking.
Lazy dask-backed reads
store.to_xarray(name) returns each variable dask-backed whenever it was stored with non-trivial
chunking (chunk_shape != shape); the dask chunks tuple mirrors the on-disk chunk grid and each
on-disk chunk is one dask task. Full-shape arrays (and 0-D scalars) come back eager as numpy. Call
.compute() to materialise, or slice / map_blocks to operate lazily.
ds_back = atlas.Atlas.open("/tmp/store").to_xarray("big")
ds_back["temperature"].data # -> dask.array.Array
ds_back["temperature"][0:100].compute() # reads exactly one chunk
Reads run under dask's threaded scheduler only — the DatasetView captured in the graph isn't
picklable, so call .compute() before handing off to distributed/multiprocessing schedulers.
How xarray maps onto the store
| Item | How it's stored |
|---|---|
| Each coord / data variable | A separate array, with dims mapped 1:1. |
| Dataset attrs | Dataset attributes, plain keys. |
| Per-variable attrs | Flattened as {var}.{attr} at the dataset attr level. |
Per-variable _FillValue |
Consumed by define_array as a typed fill value (source Dataset.attrs is not mutated). |
| Coord vs data_var distinction | JSON list in the internal _pyatlas_coords attr. |
| Non-scalar attr values (list, ndarray) | JSON-encoded string with a json: prefix marker. |
Each add_xr_dataset / ds.atlas.write creates a new dataset — there is no append-into-existing
mode.
Supported dtypes
| numpy dtype | atlas dtype |
|---|---|
int8/16/32/64, uint8/16/32/64, float32/64 |
matching numeric |
datetime64[ns] |
timestamp_nanoseconds (aliases: timestamp_ns, datetime64[ns]) |
object (str/bytes), |S<n>, |U<n> |
string (variable-length; reads return Python str) |
- 0-D scalar arrays (
shape=[]) are supported for every dtype above. boolis available as an attribute type but not as an array dtype.binary,list[...],fixed_size_list[...,N]are reserved for a later release.
Cloud / object storage
With the cloud extra, Atlas.open / Atlas.create accept an
obstore-constructed S3 / GCS / Azure / HTTP store
handle instead of a local path. The path-based local-filesystem API works without it. See the
cloud storage guide.
API reference
atlas.Atlas
| Method | Description |
|---|---|
Atlas.create(path, codec="zstd") |
Create a new store at path. |
Atlas.open(path) |
Open an existing store. |
create_dataset(name) -> DatasetView |
New dataset (in-memory until flush). |
open_dataset(name) -> DatasetView |
Existing dataset. |
delete_dataset(name) |
Remove a dataset (persisted on next flush). |
list_datasets() -> list[str] |
All dataset names. |
list_arrays() -> list[str] |
Distinct array names across datasets. |
dataset_exists(name) -> bool |
Existence check. |
add_xr_dataset(ds, name, chunks=None) |
Append an xarray.Dataset (does not flush). |
to_xarray(name) -> xr.Dataset |
Read a dataset back (chunked vars come back dask-backed). |
flush() |
The single durability boundary — persist everything. |
close() |
Alias for flush(); also the with-block exit. |
compact() |
Reclaim tombstoned space across cached array files. |
__enter__ / __exit__ |
Context-manager support (__exit__ calls close()). |
atlas.DatasetView
| Method | Description |
|---|---|
name (property) |
Dataset name. |
list_arrays() -> list[str] |
Array names in this dataset. |
define_array(name, dtype, dims, shape, chunk_shape=None, fill_value=None) |
Declare a new array. fill_value is a Python scalar matching the dtype; unwritten cells read back as it, and written cells equal to it count as nulls in array_stats. Dtype is enforced (TypeError on mismatch, OverflowError for out-of-range ints). |
write_array(name, start, data) |
Write a numpy ndarray (matching the stored dtype). |
read_array(name, start=None, shape=None) -> np.ndarray | None |
Read full or partial; None if the array isn't in this dataset. |
delete_array(name) |
Tombstone the array within this dataset. |
array_meta(name) -> dict | None |
{"dtype", "shape", "chunk_shape", "dimension_names"}. |
array_stats(name) -> dict | None |
{"row_count", "null_count", "min", "max"} — populated after flush(). |
set_attribute(key, value, dtype=None) |
Type inferred from the Python value; pass dtype to override (e.g. "int8", "float32", "timestamp_nanoseconds"). On disk: bool, int64, float64, string, timestamp_nanoseconds. |
get_attribute(key) / attributes() |
Single attribute or dict of all. |
DatasetView does not expose its own flush / compact — both go through the parent Atlas.
Examples
Runnable, self-contained scripts (each writes to a temp directory):
- 01_basics.py — create a store, define arrays, set attributes, reopen, read back.
- 02_xarray.py — round-trip an
xr.Datasetvia bothstore.add_xr_dataset(...)and theds.atlas.write(...)accessor. - 03_dask_streaming.py — stream a dask-chunked
xr.Datasetin one chunk at a time.
Performance
ATLAS is tuned for collections of many similarly-shaped datasets. On a "1000 datasets" benchmark
against netCDF4 and Zarr v3, the bulk read paths (Atlas.to_xarray_many /
Atlas.read_array_across_stacked) beat Zarr by ~2.8× on large chunked slice reads, and on small
per-dataset workloads ATLAS leads on both reads and writes. See the
benchmarks for the full
methodology, numbers, and an API picker for the fastest read path per workload.
Links
- Source & issues: https://github.com/maris-development/atlas
- License: Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file atlas_python-0.9.1.tar.gz.
File metadata
- Download URL: atlas_python-0.9.1.tar.gz
- Upload date:
- Size: 188.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e93cea48673db3468911f7a67d5b253d7c14d6ea72a8c78b62423a668ab0dc2f
|
|
| MD5 |
d6fd78e2659974890d95acc69a36714e
|
|
| BLAKE2b-256 |
1b17597475b41dd206f38b420bf54491c9ede32cf71c1f90a953069dc1fc7488
|
Provenance
The following attestation bundles were made for atlas_python-0.9.1.tar.gz:
Publisher:
atlas-python-release.yaml on maris-development/atlas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
atlas_python-0.9.1.tar.gz -
Subject digest:
e93cea48673db3468911f7a67d5b253d7c14d6ea72a8c78b62423a668ab0dc2f - Sigstore transparency entry: 1691418491
- Sigstore integration time:
-
Permalink:
maris-development/atlas@465a5010876975d8d182b3f619035bd3ce28e686 -
Branch / Tag:
refs/tags/0.9.1 - Owner: https://github.com/maris-development
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
atlas-python-release.yaml@465a5010876975d8d182b3f619035bd3ce28e686 -
Trigger Event:
release
-
Statement type:
File details
Details for the file atlas_python-0.9.1-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: atlas_python-0.9.1-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 6.6 MB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
90b6dda044c9d71f5a85cd49b23a8ac14bc45230a119892c65a961b32ca02abe
|
|
| MD5 |
c6d45b6c3bd48d0c9c90dfa9b8f01a67
|
|
| BLAKE2b-256 |
67d07225e59017a48c394f38fd8806549281160039c5ac4d5f556d43103ea5bc
|
Provenance
The following attestation bundles were made for atlas_python-0.9.1-cp310-abi3-win_amd64.whl:
Publisher:
atlas-python-release.yaml on maris-development/atlas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
atlas_python-0.9.1-cp310-abi3-win_amd64.whl -
Subject digest:
90b6dda044c9d71f5a85cd49b23a8ac14bc45230a119892c65a961b32ca02abe - Sigstore transparency entry: 1691418728
- Sigstore integration time:
-
Permalink:
maris-development/atlas@465a5010876975d8d182b3f619035bd3ce28e686 -
Branch / Tag:
refs/tags/0.9.1 - Owner: https://github.com/maris-development
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
atlas-python-release.yaml@465a5010876975d8d182b3f619035bd3ce28e686 -
Trigger Event:
release
-
Statement type:
File details
Details for the file atlas_python-0.9.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: atlas_python-0.9.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.3 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2df8c06d362191a369f9ba07d54b2b93206488f0902140577e10e260da602e83
|
|
| MD5 |
1569de9f52ccccdbc86f9348933a6a60
|
|
| BLAKE2b-256 |
d2eeab0642a05670b0b3a89e840191279470edebb1f81fdd2b0bf890b0ebb9df
|
Provenance
The following attestation bundles were made for atlas_python-0.9.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
atlas-python-release.yaml on maris-development/atlas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
atlas_python-0.9.1-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
2df8c06d362191a369f9ba07d54b2b93206488f0902140577e10e260da602e83 - Sigstore transparency entry: 1691418944
- Sigstore integration time:
-
Permalink:
maris-development/atlas@465a5010876975d8d182b3f619035bd3ce28e686 -
Branch / Tag:
refs/tags/0.9.1 - Owner: https://github.com/maris-development
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
atlas-python-release.yaml@465a5010876975d8d182b3f619035bd3ce28e686 -
Trigger Event:
release
-
Statement type:
File details
Details for the file atlas_python-0.9.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: atlas_python-0.9.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 7.2 MB
- Tags: CPython 3.10+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ee92b9b16d4791a58336fd91f44b717bc29539414682f6591886e7fb922a4c3
|
|
| MD5 |
bbe478f0406701cc15f709f8f12104a4
|
|
| BLAKE2b-256 |
6bca17ab53fbfc65735848c591f589e61652ee16a02ca9e9078c8df31b949482
|
Provenance
The following attestation bundles were made for atlas_python-0.9.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
atlas-python-release.yaml on maris-development/atlas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
atlas_python-0.9.1-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
0ee92b9b16d4791a58336fd91f44b717bc29539414682f6591886e7fb922a4c3 - Sigstore transparency entry: 1691418588
- Sigstore integration time:
-
Permalink:
maris-development/atlas@465a5010876975d8d182b3f619035bd3ce28e686 -
Branch / Tag:
refs/tags/0.9.1 - Owner: https://github.com/maris-development
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
atlas-python-release.yaml@465a5010876975d8d182b3f619035bd3ce28e686 -
Trigger Event:
release
-
Statement type:
File details
Details for the file atlas_python-0.9.1-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: atlas_python-0.9.1-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 6.5 MB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03cadd9545d4bdafe6aceea8cecef950b40a2494dfb75b9dadb8604b6ef54059
|
|
| MD5 |
fd7c55cb7639d3c5989827b58ad1eacf
|
|
| BLAKE2b-256 |
72be45996097cb172ce63782b16290a4d74c80338b101f710ace1313a38c4f59
|
Provenance
The following attestation bundles were made for atlas_python-0.9.1-cp310-abi3-macosx_11_0_arm64.whl:
Publisher:
atlas-python-release.yaml on maris-development/atlas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
atlas_python-0.9.1-cp310-abi3-macosx_11_0_arm64.whl -
Subject digest:
03cadd9545d4bdafe6aceea8cecef950b40a2494dfb75b9dadb8604b6ef54059 - Sigstore transparency entry: 1691418649
- Sigstore integration time:
-
Permalink:
maris-development/atlas@465a5010876975d8d182b3f619035bd3ce28e686 -
Branch / Tag:
refs/tags/0.9.1 - Owner: https://github.com/maris-development
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
atlas-python-release.yaml@465a5010876975d8d182b3f619035bd3ce28e686 -
Trigger Event:
release
-
Statement type:
File details
Details for the file atlas_python-0.9.1-cp310-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: atlas_python-0.9.1-cp310-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 6.9 MB
- Tags: CPython 3.10+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9499ea81ad3596e6f7608956cdd2b50ec5073793ad01ddc5c7ece954a2018afc
|
|
| MD5 |
8ff71013b4e70a83b9227e529047876f
|
|
| BLAKE2b-256 |
f992959e320b56d8d6f4e8dcaac135eb6c9140c91d30236dd44ce8a6b9758811
|
Provenance
The following attestation bundles were made for atlas_python-0.9.1-cp310-abi3-macosx_10_12_x86_64.whl:
Publisher:
atlas-python-release.yaml on maris-development/atlas
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
atlas_python-0.9.1-cp310-abi3-macosx_10_12_x86_64.whl -
Subject digest:
9499ea81ad3596e6f7608956cdd2b50ec5073793ad01ddc5c7ece954a2018afc - Sigstore transparency entry: 1691418835
- Sigstore integration time:
-
Permalink:
maris-development/atlas@465a5010876975d8d182b3f619035bd3ce28e686 -
Branch / Tag:
refs/tags/0.9.1 - Owner: https://github.com/maris-development
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
atlas-python-release.yaml@465a5010876975d8d182b3f619035bd3ce28e686 -
Trigger Event:
release
-
Statement type: