Skip to main content

A user-defined function framework for Apache Arrow

Project description

Arrow UDF Python Server

Installation

pip install arrow-udf

Usage

Define functions in a Python file:

# udf.py
from arrow_udf import udf, udtf, UdfServer
import struct
import socket

# Define a scalar function
@udf(input_types=['INT', 'INT'], result_type='INT')
def gcd(x, y):
    while y != 0:
        (x, y) = (y, x % y)
    return x

# Define a scalar function that returns multiple values (within a struct)
@udf(input_types=['BINARY'], result_type='STRUCT<src_addr: STRING, dst_addr: STRING, src_port: INT16, dst_port: INT16>')
def extract_tcp_info(tcp_packet: bytes):
    src_addr, dst_addr = struct.unpack('!4s4s', tcp_packet[12:20])
    src_port, dst_port = struct.unpack('!HH', tcp_packet[20:24])
    src_addr = socket.inet_ntoa(src_addr)
    dst_addr = socket.inet_ntoa(dst_addr)
    return {
        'src_addr': src_addr,
        'dst_addr': dst_addr,
        'src_port': src_port,
        'dst_port': dst_port,
    }

# Define a table function
@udtf(input_types='INT', result_types='INT')
def series(n):
    for i in range(n):
        yield i

# Start a UDF server
if __name__ == '__main__':
    server = UdfServer(location="0.0.0.0:8815")
    server.add_function(gcd)
    server.add_function(extract_tcp_info)
    server.add_function(series)
    server.serve()

Start the UDF server:

python3 udf.py

Data Types

Arrow Type Python Type
boolean bool
int8 int
int16 int
int32 int
int64 int
uint8 int
uint16 int
uint32 int
uint64 int
float32 float
float32 float
date32 datetime.date
time64 datetime.time
timestamp datetime.datetime
interval MonthDayNano / (int, int, int) (fields can be obtained by months(), days() and nanoseconds() from MonthDayNano)
string str
binary bytes
large_string str
large_binary bytes

Extension types:

Data type Metadata Python Type
decimal arrowudf.decimal decimal.Decimal
json arrowudf.json any

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arrow_udf-0.3.1.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arrow_udf-0.3.1-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file arrow_udf-0.3.1.tar.gz.

File metadata

  • Download URL: arrow_udf-0.3.1.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for arrow_udf-0.3.1.tar.gz
Algorithm Hash digest
SHA256 f0b808cee479e57eb402e66bdf7b03f2c5edee31d09b8a3a8ba8ff39d4ab8ed9
MD5 90b184e0b08973957490a1b537dce42d
BLAKE2b-256 21667ed7ee4c6808477a1ffc336dd299725cde3233963fdd6e136f483f41853b

See more details on using hashes here.

File details

Details for the file arrow_udf-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: arrow_udf-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for arrow_udf-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b98ccc7e74b0a2235d5145e9ebf87ce105bfcd2b70b3755af804f73a3eec645f
MD5 c25ad60607f8a1c01619a7dbdd44edca
BLAKE2b-256 787b43371fd612fc96a4ca45f579e9b745de616ae5e2e6510267c107045e05e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page