Lightweight labelled multidimensional arrays with NumPy arrays under the hood.
Project description
[k]array: labeled multi-dimensional arrays
Karray is a simple tool that intends to abstract the users from the complexity of working with labelled multi-dimensional arrays. Numpy is the tool’s core, with an extensive collection of high-level mathematical functions to operate on multi-dimensional arrays efficiently thanks to its well-optimized C code. With Karray, we put effort into generating lightweight objects expecting to reduce overheads and avoid large loops that cause bottlenecks and impact performance. Numpy is the only relevant dependency, while Polars, Pandas, sparse and Pyarrow are required to import, export and store the arrays. karray is developed by the research group Transformation of the Energy Economy at DIW Berlin (German Institute of Economic Research).
Links
- Documentation: https://diw-evu.gitlab.io/karray
- Source code: https://gitlab.com/diw-evu/karray
- PyPI releases: https://pypi.org/project/karray
Table of contents
Getting started
Quick installation
To install karray, you can use pip:
pip install karray
Importing karray
To start using karray, import the necessary classes and functions:
import karray as ka
# then you can use ka.Array, ka.Long, and ka.settings
The Array class represents a labeled multidimensional array, while the Long class represents a labeled one-dimensional array. The settings object allows you to configure various options for karray.
Usage Examples
Creating an Array
You can create an Array object in several ways:
- From a
Longobject and coordinates:
import pandas as pd
index = {'dim1': ['a', 'b'],
'dim2': [1, 2],
'dim3': pd.to_datetime(['2020-01-01', '2020-01-02'], utc=True)}
value = [10., 20.]
long = ka.Long(index=index, value=value)
arr1 = ka.Array(data=long)
arr1
[k]array
| Long object size | 64 bytes |
| Data object type | dense |
| Data object size | 64 bytes |
| Dimensions | ['dim1', 'dim2', 'dim3'] |
| Shape | [2, 2, 2] |
| Capacity | 8 |
| Rows | 2 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
| dim3 | 2 | datetime64[ns] | ['2020-01-01T00:00:00.000000000' '2020-01-02T00:00:00.000000000'] |
Data
| dim1 | dim2 | dim3 | value | |
|---|---|---|---|---|
| 0 | a | 1 | 2020-01-01T00:00:00.000000000 | 10.00 |
| 1 | b | 2 | 2020-01-02T00:00:00.000000000 | 20.00 |
- From a tuple of index and value, and coordinates:
index2 = {'dim1': ['a', 'b'], 'dim2': [1, 2]}
value2 = [10, 20]
coords2 = {'dim1': ['a', 'b'], 'dim2': [1, 2]}
arr2 = ka.Array(data=(index2, value2), coords=coords2)
arr2
[k]array
| Long object size | 48 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim1', 'dim2'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 2 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim1 | dim2 | value | |
|---|---|---|---|
| 0 | a | 1 | 10 |
| 1 | b | 2 | 20 |
- From a dense NumPy array and coordinates:
import numpy as np
dense = np.array([[10, 20], [30, 40]])
coords3 = {'dim1': ['a', 'b'], 'dim2': [1, 2]}
arr3 = ka.Array(data=dense, coords=coords3)
arr3
[k]array
| Long object size | 96 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim1', 'dim2'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 4 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim1 | dim2 | value | |
|---|---|---|---|
| 0 | a | 1 | 10.00 |
| 1 | a | 2 | 20.00 |
| 2 | b | 1 | 30.00 |
| 3 | b | 2 | 40.00 |
- From a sparse array (using the
sparselibrary) and coordinates:
import sparse as sp
sparse_arr = sp.COO(data=[10, 20], coords=[[0, 1], [0, 1]], shape=(2, 2))
coords4 = {'dim1': ['a', 'b'], 'dim2': [1, 2]}
arr4 = ka.Array(data=sparse_arr, coords=coords4)
arr4
[k]array
| Long object size | 48 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim1', 'dim2'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 2 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim1 | dim2 | value | |
|---|---|---|---|
| 0 | a | 1 | 10 |
| 1 | b | 2 | 20 |
Accessing Array Elements
You can access elements of an Array object using various methods:
- Using the
items()method to iterate over the array elements:
for item in arr3.items():
print(item)
('dim1', array(['a', 'a', 'b', 'b'], dtype=object))
('dim2', array([1, 2, 1, 2]))
('value', array([10., 20., 30., 40.]))
- Using the
to_pandas()method to convert the array to a pandas DataFrame:
df = arr1.to_pandas()
print(df)
dim1 dim2 dim3 value
0 a 1 2020-01-01 10.0
1 b 2 2020-01-02 20.0
- Using the
to_polars()method to convert the array to a polars DataFrame:
df = arr1.to_polars()
print(df)
shape: (2, 4)
┌──────┬──────┬─────────────────────┬───────┐
│ dim1 ┆ dim2 ┆ dim3 ┆ value │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ datetime[ns] ┆ f64 │
╞══════╪══════╪═════════════════════╪═══════╡
│ a ┆ 1 ┆ 2020-01-01 00:00:00 ┆ 10.0 │
│ b ┆ 2 ┆ 2020-01-02 00:00:00 ┆ 20.0 │
└──────┴──────┴─────────────────────┴───────┘
Array Operations
karray provides various operations that can be performed on Array objects:
- Arithmetic operations:
result = arr1 + arr2
result
[k]array
| Long object size | 128 bytes |
| Data object type | dense |
| Data object size | 64 bytes |
| Dimensions | ['dim1', 'dim2', 'dim3'] |
| Shape | [2, 2, 2] |
| Capacity | 8 |
| Rows | 4 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
| dim3 | 2 | datetime64[ns] | ['2020-01-01T00:00:00.000000000' '2020-01-02T00:00:00.000000000'] |
Data
| dim1 | dim2 | dim3 | value | |
|---|---|---|---|---|
| 0 | a | 1 | 2020-01-01T00:00:00.000000000 | 20.00 |
| 1 | a | 1 | 2020-01-02T00:00:00.000000000 | 10.00 |
| 2 | b | 2 | 2020-01-01T00:00:00.000000000 | 20.00 |
| 3 | b | 2 | 2020-01-02T00:00:00.000000000 | 40.00 |
result = arr3 * 2
result
[k]array
| Long object size | 96 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim1', 'dim2'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 4 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim1 | dim2 | value | |
|---|---|---|---|
| 0 | a | 1 | 20.00 |
| 1 | a | 2 | 40.00 |
| 2 | b | 1 | 60.00 |
| 3 | b | 2 | 80.00 |
result = arr4 - 1
result
[k]array
| Long object size | 96 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim1', 'dim2'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 4 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim1 | dim2 | value | |
|---|---|---|---|
| 0 | a | 1 | 9.00 |
| 1 | a | 2 | -1.00 |
| 2 | b | 1 | -1.00 |
| 3 | b | 2 | 19.00 |
- Comparison operations:
mask = arr2 > 10
mask = arr2 == 5
- Logical operations:
result = arr2 & arr4
result = arr2 | arr4
result = ~arr2
- Reduction operations:
result = arr1.reduce('dim1', aggfunc='sum')
result
[k]array
| Long object size | 48 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim2', 'dim3'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 2 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim2 | 2 | int64 | [1 2] |
| dim3 | 2 | datetime64[ns] | ['2020-01-01T00:00:00.000000000' '2020-01-02T00:00:00.000000000'] |
Data
| dim2 | dim3 | value | |
|---|---|---|---|
| 0 | 1 | 2020-01-01T00:00:00.000000000 | 10.00 |
| 1 | 2 | 2020-01-02T00:00:00.000000000 | 20.00 |
result = arr1.reduce('dim2', aggfunc=np.mean)
result
[k]array
| Long object size | 48 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim1', 'dim3'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 2 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim3 | 2 | datetime64[ns] | ['2020-01-01T00:00:00.000000000' '2020-01-02T00:00:00.000000000'] |
Data
| dim1 | dim3 | value | |
|---|---|---|---|
| 0 | a | 2020-01-01T00:00:00.000000000 | 5.00 |
| 1 | b | 2020-01-02T00:00:00.000000000 | 10.00 |
- Shifting and rolling operations:
shifted = arr3.shift(dim1=1, dim2=-1, fill_value=0.)
shifted
[k]array
| Long object size | 24 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim1', 'dim2'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 1 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim1 | dim2 | value | |
|---|---|---|---|
| 0 | b | 1 | 20.00 |
rolled = arr3.roll(dim1=2)
rolled
[k]array
| Long object size | 96 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim1', 'dim2'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 4 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim1 | dim2 | value | |
|---|---|---|---|
| 0 | a | 1 | 10.00 |
| 1 | a | 2 | 20.00 |
| 2 | b | 1 | 30.00 |
| 3 | b | 2 | 40.00 |
- Inserting new dimensions:
# One dimension with one element
result = arr2.insert(dim3='x')
result
[k]array
| Long object size | 64 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim3', 'dim1', 'dim2'] |
| Shape | [1, 2, 2] |
| Capacity | 4 |
| Rows | 2 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim3 | 1 | object | ['x'] |
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim3 | dim1 | dim2 | value | |
|---|---|---|---|---|
| 0 | x | a | 1 | 10 |
| 1 | x | b | 2 | 20 |
# One dimension with several elements related to an existing dimension using a dict
result = arr2.insert(dim3={'dim1': {'a': -1, 'b': -2}})
result
[k]array
| Long object size | 64 bytes |
| Data object type | dense |
| Data object size | 64 bytes |
| Dimensions | ['dim3', 'dim1', 'dim2'] |
| Shape | [2, 2, 2] |
| Capacity | 8 |
| Rows | 2 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim3 | 2 | int64 | [-2 -1] |
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim3 | dim1 | dim2 | value | |
|---|---|---|---|---|
| 0 | -1 | a | 1 | 10 |
| 1 | -2 | b | 2 | 20 |
# One dimension with several elements related to an existing dimension using two lists
result = arr2.insert(dim3={'dim1': [['a', 'b'], [-1, -2]]})
result
[k]array
| Long object size | 64 bytes |
| Data object type | dense |
| Data object size | 64 bytes |
| Dimensions | ['dim3', 'dim1', 'dim2'] |
| Shape | [2, 2, 2] |
| Capacity | 8 |
| Rows | 2 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim3 | 2 | int64 | [-1 -2] |
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim3 | dim1 | dim2 | value | |
|---|---|---|---|---|
| 0 | -1 | a | 1 | 10 |
| 1 | -2 | b | 2 | 20 |
- Drop a dimension:
result = arr1.drop('dim3')
result
[k]array
| Long object size | 48 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim1', 'dim2'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 2 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim1 | dim2 | value | |
|---|---|---|---|
| 0 | a | 1 | 10.00 |
| 1 | b | 2 | 20.00 |
!Note
Dropping a dimension will work only if the resulting array still has unique coordinates. If dropping a dimension leads to an array with duplicate coordinates, as a results of the removed dimension, karray will raise an error.
# Assertion error due to duplicate coords
try:
arr3.drop('dim2')
except AssertionError as e:
print(e)
Index items per row must be unique. By removing ['dim2'] leads the existence of repeated indexes
e.g.:
('dim1',) value
0 ('a',) 10.0
1 ('a',) 20.0
Intead, you can use obj.reduce('dim2')
With an aggfunc: sum() by default
- Expanding a dimension (Broadcasting)
result = arr3.expand(dim3=['x', 'y', 'z'])
result
[k]array
| Long object size | 384 bytes |
| Data object type | dense |
| Data object size | 96 bytes |
| Dimensions | ['dim1', 'dim2', 'dim3'] |
| Shape | [2, 2, 3] |
| Capacity | 12 |
| Rows | 12 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
| dim3 | 3 | object | ['x' 'y' 'z'] |
Data
| dim1 | dim2 | dim3 | value | |
|---|---|---|---|---|
| 0 | a | 1 | x | 10.00 |
| 1 | a | 1 | y | 10.00 |
| 2 | a | 1 | z | 10.00 |
| 3 | a | 2 | x | 20.00 |
| 4 | a | 2 | y | 20.00 |
| 5 | a | 2 | z | 20.00 |
| 6 | b | 1 | x | 30.00 |
| 7 | b | 1 | y | 30.00 |
| 8 | b | 1 | z | 30.00 |
| 9 | b | 2 | x | 40.00 |
| 10 | b | 2 | y | 40.00 |
| 11 | b | 2 | z | 40.00 |
- ufunc operations
arr3.ufunc(dim='dim2', func=np.prod, keepdims=True)
[k]array
| Long object size | 96 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim1', 'dim2'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 4 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim1 | dim2 | value | |
|---|---|---|---|
| 0 | a | 1 | 200.00 |
| 1 | a | 2 | 200.00 |
| 2 | b | 1 | 1200.00 |
| 3 | b | 2 | 1200.00 |
!Note
The dim argument is passed to ufunc as axis argument in numpy and keepdims argument is passed with the same name. You can add more arguments depending on the ufunc.
Saving and Loading Arrays
karray supports saving and loading arrays using the Feather format:
- Saving an array to a Feather file:
arr1.to_feather('array.feather')
- Loading an array from a Feather file:
loaded_arr1 = ka.from_feather('array.feather')
loaded_arr1
[k]array
| Long object size | 64 bytes |
| Data object type | dense |
| Data object size | 64 bytes |
| Dimensions | ['dim1', 'dim2', 'dim3'] |
| Shape | [2, 2, 2] |
| Capacity | 8 |
| Rows | 2 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
| dim3 | 2 | int64 | [1577836800000000000 1577923200000000000] |
Data
| dim1 | dim2 | dim3 | value | |
|---|---|---|---|---|
| 0 | a | 1 | 2020-01-01T00:00:00.000000000 | 10.00 |
| 1 | b | 2 | 2020-01-02T00:00:00.000000000 | 20.00 |
Interoperability with Other Libraries
karray provides interoperability with other popular data manipulation libraries:
- Converting an array to a pandas DataFrame and then back to an array:
df = arr2.to_pandas()
new_arr = ka.from_pandas(df, coords=coords2)
new_arr
[k]array
| Long object size | 48 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim1', 'dim2'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 2 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim1 | dim2 | value | |
|---|---|---|---|
| 0 | a | 1 | 10 |
| 1 | b | 2 | 20 |
- Converting an array to a polars DataFrame and then back to an array:
df = arr2.to_polars()
new_arr = ka.from_polars(df, coords=coords2)
new_arr
[k]array
| Long object size | 48 bytes |
| Data object type | dense |
| Data object size | 32 bytes |
| Dimensions | ['dim1', 'dim2'] |
| Shape | [2, 2] |
| Capacity | 4 |
| Rows | 2 |
Coords
| Dimension | Length | Type | Items |
|---|---|---|---|
| dim1 | 2 | object | ['a' 'b'] |
| dim2 | 2 | int64 | [1 2] |
Data
| dim1 | dim2 | value | |
|---|---|---|---|
| 0 | a | 1 | 10 |
| 1 | b | 2 | 20 |
There are many more features and functionalities. Please refer to the source code section for more details.
!Note
karray is a work in progress. The API is subject to change in the future. We are looking for feedback, suggestions, and we appreciate your contributions.
© 2024 Carlos Gaete-Morales
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file karray-2024.3.7.tar.gz.
File metadata
- Download URL: karray-2024.3.7.tar.gz
- Upload date:
- Size: 40.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c811a17469e6d9109d25e072fef060fdd2eb28cd17cdb6ade3fcaed4d529ae9f
|
|
| MD5 |
516a62184640a6d0475b9cb6c4d6e12d
|
|
| BLAKE2b-256 |
d32570252e7678b94dae32e2cf5755309043cc67cd24e8607326991e1025b528
|
File details
Details for the file karray-2024.3.7-py3-none-any.whl.
File metadata
- Download URL: karray-2024.3.7-py3-none-any.whl
- Upload date:
- Size: 35.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d915adfd1ef0c2abdfc460432fe2d778f336a1b636a2e17edd5a0687da30b51
|
|
| MD5 |
faa193ba4eca86fdadbe1511bc50b830
|
|
| BLAKE2b-256 |
098a7a72b5dd2d9d75d0e7e5e991909fbfa73b1de1c1751a6f8674897e55509b
|