Skip to main content

Arrow, pydantic style

Project description

Welcome to arrowdantic

Arrowdantic is a small Python library backed by a mature Rust implementation of Apache Arrow that can interoperate with

For simple (but data-heavy) data engineering tasks, this package essentially replaces pyarrow: it supports reading from and writing to Parquet, Arrow at the same or higher performance and higher safety (e.g. no segfaults).

Furthermore, it supports reading from and writing to ODBC compliant databases at the same or higher performance than turbodbc.

This package is particularly suitable for environments such as AWS Lambda - it takes 8M of disk space, compared to 82M taken by pyarrow.

Features

  • declare and access Arrow-backed arrays (integers, floats, boolean, string, binary)
  • read from and write to Apache Arrow IPC file
  • read from and write to Apache Parquet
  • read from and write to ODBC-compliant databases (e.g. postgres, mongoDB)

Examples

Use parquet

import io
import arrowdantic as ad

original_arrays = [ad.UInt32Array([1, None])]

schema = ad.Schema(
    [ad.Field(f"c{i}", array.type, True) for i, array in enumerate(original_arrays)]
)

data = io.BytesIO()
with ad.ParquetFileWriter(data, schema) as writer:
    writer.write(ad.Chunk(original_arrays))
data.seek(0)

reader = ad.ParquetFileReader(data)
chunk = next(reader)
assert chunk.arrays() == original_arrays

Use Arrow files

import arrowdantic as ad

original_arrays = [ad.UInt32Array([1, None])]

schema = ad.Schema(
    [ad.Field(f"c{i}", array.type, True) for i, array in enumerate(original_arrays)]
)

import io

data = io.BytesIO()
with ad.ArrowFileWriter(data, schema) as writer:
    writer.write(ad.Chunk(original_arrays))
data.seek(0)

reader = ad.ArrowFileReader(data)
chunk = next(reader)
assert chunk.arrays() == original_arrays

Use ODBC

import arrowdantic as ad


arrays = [ad.Int32Array([1, None]), ad.StringArray(["aa", None])]

with ad.ODBCConnector(r"Driver={SQLite3};Database=sqlite-test.db") as con:
    # create an empty table with a schema
    con.execute("DROP TABLE IF EXISTS example;")
    con.execute("CREATE TABLE example (c1 INT, c2 TEXT);")

    # insert the arrays
    con.write("INSERT INTO example (c1, c2) VALUES (?, ?)", ad.Chunk(arrays))

    # read the arrays
    with con.execute("SELECT c1, c2 FROM example", 1024) as chunks:
        assert chunks.fields() == [
            ad.Field("c1", ad.DataType.int32(), True),
            ad.Field("c2", ad.DataType.string(), True),
        ]
        chunk = next(chunks)
assert chunk.arrays() == arrays

Use timezones

This package fully supports datetime and conversions between them and arrow:

import arrowdantic as ad


dt = datetime.datetime(
    year=2021,
    month=1,
    day=1,
    hour=1,
    minute=1,
    second=1,
    microsecond=1,
    tzinfo=datetime.timezone.utc,
)
a = ad.TimestampArray([dt, None])
assert (
    str(a)
    == 'Timestamp(Microsecond, Some("+00:00"))[2021-01-01 01:01:01.000001 +00:00, None]'
)
assert list(a) == [dt, None]
assert a.type == ad.DataType.timestamp(datetime.timezone.utc)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arrowdantic-0.2.2.tar.gz (17.4 kB view hashes)

Uploaded Source

Built Distributions

arrowdantic-0.2.2-pp37-pypy37_pp73-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.0 MB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ x86-64

arrowdantic-0.2.2-cp310-none-win_amd64.whl (2.7 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

arrowdantic-0.2.2-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.0 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64

arrowdantic-0.2.2-cp310-cp310-macosx_10_7_x86_64.whl (2.8 MB view hashes)

Uploaded CPython 3.10 macOS 10.7+ x86-64

arrowdantic-0.2.2-cp39-none-win_amd64.whl (2.7 MB view hashes)

Uploaded CPython 3.9 Windows x86-64

arrowdantic-0.2.2-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

arrowdantic-0.2.2-cp39-cp39-macosx_10_7_x86_64.whl (2.8 MB view hashes)

Uploaded CPython 3.9 macOS 10.7+ x86-64

arrowdantic-0.2.2-cp38-none-win_amd64.whl (2.7 MB view hashes)

Uploaded CPython 3.8 Windows x86-64

arrowdantic-0.2.2-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.0 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

arrowdantic-0.2.2-cp38-cp38-macosx_10_7_x86_64.whl (2.8 MB view hashes)

Uploaded CPython 3.8 macOS 10.7+ x86-64

arrowdantic-0.2.2-cp37-none-win_amd64.whl (2.7 MB view hashes)

Uploaded CPython 3.7 Windows x86-64

arrowdantic-0.2.2-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.0 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

arrowdantic-0.2.2-cp37-cp37m-macosx_10_7_x86_64.whl (2.8 MB view hashes)

Uploaded CPython 3.7m macOS 10.7+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page