Skip to main content

Python bindings to the nanoarrow C library

Project description

nanoarrow for Python

The nanoarrow Python package provides bindings to the nanoarrow C library. Like the nanoarrow C library, it provides tools to facilitate the use of the Arrow C Data and Arrow C Stream interfaces.

Installation

Python bindings for nanoarrow are not yet available on PyPI. You can install via URL (requires a C compiler):

python -m pip install "git+https://github.com/apache/arrow-nanoarrow.git#egg=nanoarrow&subdirectory=python"

If you can import the namespace, you're good to go!

import nanoarrow as na

Low-level C library bindings

The Arrow C Data and Arrow C Stream interfaces are comprised of three structures: the ArrowSchema which represents a data type of an array, the ArrowArray which represents the values of an array, and an ArrowArrayStream, which represents zero or more ArrowArrays with a common ArrowSchema.

Schemas

Use nanoarrow.c_schema() to convert an object to an ArrowSchema and wrap it as a Python object. This works for any object implementing the Arrow PyCapsule Interface (e.g., pyarrow.Schema, pyarrow.DataType, and pyarrow.Field).

import pyarrow as pa
schema = na.c_schema(pa.decimal128(10, 3))
schema
<nanoarrow.c_lib.CSchema decimal128(10, 3)>
- format: 'd:10,3'
- name: ''
- flags: 2
- metadata: NULL
- dictionary: NULL
- children[0]:

You can extract the fields of a CSchema object one at a time or parse it into a view to extract deserialized parameters.

na.c_schema_view(schema)
<nanoarrow.c_lib.CSchemaView>
- type: 'decimal128'
- storage_type: 'decimal128'
- decimal_bitwidth: 128
- decimal_precision: 10
- decimal_scale: 3

Advanced users can allocate an empty CSchema and populate its contents by passing its ._addr() to a schema-exporting function.

schema = na.allocate_c_schema()
pa.int32()._export_to_c(schema._addr())
schema
<nanoarrow.c_lib.CSchema int32>
- format: 'i'
- name: ''
- flags: 2
- metadata: NULL
- dictionary: NULL
- children[0]:

The CSchema object cleans up after itself: when the object is deleted, the underlying ArrowSchema is released.

Arrays

You can use nanoarrow.c_array() to convert an array-like object to an ArrowArray, wrap it as a Python object, and attach a schema that can be used to interpret its contents. This works for any object implementing the Arrow PyCapsule Interface (e.g., pyarrow.Array, pyarrow.RecordBatch).

array = na.c_array(pa.array(["one", "two", "three", None]))
array
<nanoarrow.c_lib.CArray string>
- length: 4
- offset: 0
- null_count: 1
- buffers: (2939032895680, 2939032895616, 2939032895744)
- dictionary: NULL
- children[0]:

You can extract the fields of a CArray one at a time or parse it into a view to extract deserialized content:

na.c_array_view(array)
<nanoarrow.c_lib.CArrayView>
- storage_type: 'string'
- length: 4
- offset: 0
- null_count: 1
- buffers[3]:
  - <bool validity[1 b] 11100000>
  - <int32 data_offset[20 b] 0 3 6 11 11>
  - <string data[11 b] b'onetwothree'>
- dictionary: NULL
- children[0]:

Like the CSchema, you can allocate an empty one and access its address with _addr() to pass to other array-exporting functions.

array = na.allocate_c_array()
pa.array([1, 2, 3])._export_to_c(array._addr(), array.schema._addr())
array.length
3

Array streams

You can use nanoarrow.c_array_stream() to wrap an object representing a sequence of CArrays with a common CSchema to an ArrowArrayStream and wrap it as a Python object. This works for any object implementing the Arrow PyCapsule Interface (e.g., pyarrow.RecordBatchReader).

pa_array_child = pa.array([1, 2, 3], pa.int32())
pa_array = pa.record_batch([pa_array_child], names=["some_column"])
reader = pa.RecordBatchReader.from_batches(pa_array.schema, [pa_array])
array_stream = na.c_array_stream(reader)
array_stream
<nanoarrow.c_lib.CArrayStream>
- get_schema(): <nanoarrow.c_lib.CSchema struct>
  - format: '+s'
  - name: ''
  - flags: 0
  - metadata: NULL
  - dictionary: NULL
  - children[1]:
    'some_column': <nanoarrow.c_lib.CSchema int32>
      - format: 'i'
      - name: 'some_column'
      - flags: 2
      - metadata: NULL
      - dictionary: NULL
      - children[0]:

You can pull the next array from the stream using .get_next() or use it like an iterator. The .get_next() method will raise StopIteration when there are no more arrays in the stream.

for array in array_stream:
    print(array)
<nanoarrow.c_lib.CArray struct>
- length: 3
- offset: 0
- null_count: 0
- buffers: (0,)
- dictionary: NULL
- children[1]:
  'some_column': <nanoarrow.c_lib.CArray int32>
    - length: 3
    - offset: 0
    - null_count: 0
    - buffers: (0, 2939033026688)
    - dictionary: NULL
    - children[0]:

You can also get the address of a freshly-allocated stream to pass to a suitable exporting function:

array_stream = na.allocate_c_array_stream()
reader._export_to_c(array_stream._addr())
array_stream
<nanoarrow.c_lib.CArrayStream>
- get_schema(): <nanoarrow.c_lib.CSchema struct>
  - format: '+s'
  - name: ''
  - flags: 0
  - metadata: NULL
  - dictionary: NULL
  - children[1]:
    'some_column': <nanoarrow.c_lib.CSchema int32>
      - format: 'i'
      - name: 'some_column'
      - flags: 2
      - metadata: NULL
      - dictionary: NULL
      - children[0]:

Development

Python bindings for nanoarrow are managed with setuptools. This means you can build the project using:

git clone https://github.com/apache/arrow-nanoarrow.git
cd arrow-nanoarrow/python
pip install -e .

Tests use pytest:

# Install dependencies
pip install -e .[test]

# Run tests
pytest -vvx

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nanoarrow-0.4.0.tar.gz (80.4 kB view hashes)

Uploaded Source

Built Distributions

nanoarrow-0.4.0-pp310-pypy310_pp73-win_amd64.whl (194.3 kB view hashes)

Uploaded PyPy Windows x86-64

nanoarrow-0.4.0-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (237.8 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

nanoarrow-0.4.0-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (229.2 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

nanoarrow-0.4.0-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (245.7 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

nanoarrow-0.4.0-pp310-pypy310_pp73-macosx_10_9_x86_64.whl (214.9 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

nanoarrow-0.4.0-pp39-pypy39_pp73-win_amd64.whl (194.2 kB view hashes)

Uploaded PyPy Windows x86-64

nanoarrow-0.4.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (237.2 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

nanoarrow-0.4.0-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (228.0 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

nanoarrow-0.4.0-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (245.0 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

nanoarrow-0.4.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl (214.4 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

nanoarrow-0.4.0-pp38-pypy38_pp73-win_amd64.whl (194.8 kB view hashes)

Uploaded PyPy Windows x86-64

nanoarrow-0.4.0-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (241.6 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

nanoarrow-0.4.0-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (231.2 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

nanoarrow-0.4.0-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (248.3 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

nanoarrow-0.4.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (216.1 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

nanoarrow-0.4.0-cp312-cp312-win_amd64.whl (220.8 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

nanoarrow-0.4.0-cp312-cp312-win32.whl (199.5 kB view hashes)

Uploaded CPython 3.12 Windows x86

nanoarrow-0.4.0-cp312-cp312-musllinux_1_1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.12 musllinux: musl 1.1+ x86-64

nanoarrow-0.4.0-cp312-cp312-musllinux_1_1_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.12 musllinux: musl 1.1+ i686

nanoarrow-0.4.0-cp312-cp312-musllinux_1_1_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.12 musllinux: musl 1.1+ ARM64

nanoarrow-0.4.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

nanoarrow-0.4.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

nanoarrow-0.4.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

nanoarrow-0.4.0-cp312-cp312-macosx_11_0_arm64.whl (251.2 kB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

nanoarrow-0.4.0-cp312-cp312-macosx_10_9_x86_64.whl (265.3 kB view hashes)

Uploaded CPython 3.12 macOS 10.9+ x86-64

nanoarrow-0.4.0-cp311-cp311-win_amd64.whl (221.3 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

nanoarrow-0.4.0-cp311-cp311-win32.whl (199.1 kB view hashes)

Uploaded CPython 3.11 Windows x86

nanoarrow-0.4.0-cp311-cp311-musllinux_1_1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

nanoarrow-0.4.0-cp311-cp311-musllinux_1_1_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

nanoarrow-0.4.0-cp311-cp311-musllinux_1_1_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ ARM64

nanoarrow-0.4.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

nanoarrow-0.4.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

nanoarrow-0.4.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

nanoarrow-0.4.0-cp311-cp311-macosx_11_0_arm64.whl (252.6 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

nanoarrow-0.4.0-cp311-cp311-macosx_10_9_x86_64.whl (266.1 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

nanoarrow-0.4.0-cp310-cp310-win_amd64.whl (221.5 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

nanoarrow-0.4.0-cp310-cp310-win32.whl (199.5 kB view hashes)

Uploaded CPython 3.10 Windows x86

nanoarrow-0.4.0-cp310-cp310-musllinux_1_1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

nanoarrow-0.4.0-cp310-cp310-musllinux_1_1_i686.whl (1.0 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

nanoarrow-0.4.0-cp310-cp310-musllinux_1_1_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ ARM64

nanoarrow-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

nanoarrow-0.4.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

nanoarrow-0.4.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.0 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

nanoarrow-0.4.0-cp310-cp310-macosx_11_0_arm64.whl (251.2 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

nanoarrow-0.4.0-cp310-cp310-macosx_10_9_x86_64.whl (264.6 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

nanoarrow-0.4.0-cp39-cp39-win_amd64.whl (221.3 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

nanoarrow-0.4.0-cp39-cp39-win32.whl (199.6 kB view hashes)

Uploaded CPython 3.9 Windows x86

nanoarrow-0.4.0-cp39-cp39-musllinux_1_1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

nanoarrow-0.4.0-cp39-cp39-musllinux_1_1_i686.whl (1.0 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

nanoarrow-0.4.0-cp39-cp39-musllinux_1_1_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ ARM64

nanoarrow-0.4.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

nanoarrow-0.4.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

nanoarrow-0.4.0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

nanoarrow-0.4.0-cp39-cp39-macosx_11_0_arm64.whl (251.5 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

nanoarrow-0.4.0-cp39-cp39-macosx_10_9_x86_64.whl (264.6 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

nanoarrow-0.4.0-cp38-cp38-win_amd64.whl (222.5 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

nanoarrow-0.4.0-cp38-cp38-win32.whl (200.2 kB view hashes)

Uploaded CPython 3.8 Windows x86

nanoarrow-0.4.0-cp38-cp38-musllinux_1_1_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

nanoarrow-0.4.0-cp38-cp38-musllinux_1_1_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

nanoarrow-0.4.0-cp38-cp38-musllinux_1_1_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ ARM64

nanoarrow-0.4.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

nanoarrow-0.4.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

nanoarrow-0.4.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.0 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

nanoarrow-0.4.0-cp38-cp38-macosx_11_0_arm64.whl (251.4 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

nanoarrow-0.4.0-cp38-cp38-macosx_10_9_x86_64.whl (263.0 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page