Skip to main content

A user-defined function framework for Apache Arrow

Project description

Arrow UDF Python Server

Installation

pip install arrow-udf

Usage

Define functions in a Python file:

# udf.py
from arrow_udf import udf, udtf, UdfServer
import struct
import socket

# Define a scalar function
@udf(input_types=['INT', 'INT'], result_type='INT')
def gcd(x, y):
    while y != 0:
        (x, y) = (y, x % y)
    return x

# Define a scalar function that returns multiple values (within a struct)
@udf(input_types=['BINARY'], result_type='STRUCT<src_addr: STRING, dst_addr: STRING, src_port: INT16, dst_port: INT16>')
def extract_tcp_info(tcp_packet: bytes):
    src_addr, dst_addr = struct.unpack('!4s4s', tcp_packet[12:20])
    src_port, dst_port = struct.unpack('!HH', tcp_packet[20:24])
    src_addr = socket.inet_ntoa(src_addr)
    dst_addr = socket.inet_ntoa(dst_addr)
    return {
        'src_addr': src_addr,
        'dst_addr': dst_addr,
        'src_port': src_port,
        'dst_port': dst_port,
    }

# Define a table function
@udtf(input_types='INT', result_types='INT')
def series(n):
    for i in range(n):
        yield i

# Start a UDF server
if __name__ == '__main__':
    server = UdfServer(location="0.0.0.0:8815")
    server.add_function(gcd)
    server.add_function(extract_tcp_info)
    server.add_function(series)
    server.serve()

Start the UDF server:

python3 udf.py

Data Types

Arrow Type Python Type
boolean bool
int8 int
int16 int
int32 int
int64 int
uint8 int
uint16 int
uint32 int
uint64 int
float32 float
float32 float
date32 datetime.date
time64 datetime.time
timestamp datetime.datetime
interval MonthDayNano / (int, int, int) (fields can be obtained by months(), days() and nanoseconds() from MonthDayNano)
string str
binary bytes
large_string str
large_binary bytes

Extension types:

Data type Metadata Python Type
decimal arrowudf.decimal decimal.Decimal
json arrowudf.json any

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arrow_udf-0.3.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arrow_udf-0.3.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file arrow_udf-0.3.0.tar.gz.

File metadata

  • Download URL: arrow_udf-0.3.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for arrow_udf-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ce554b600dcd0393f8aad360ee8b7de65f3238bb6166de9cb7a04318b6718769
MD5 04e27aeb24391237b1aa50e32c1ab9af
BLAKE2b-256 b4f4fe73f574cf587bb8e4ca4fca26f53213eb5f1aa082fea4cfc82a0be33786

See more details on using hashes here.

File details

Details for the file arrow_udf-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: arrow_udf-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for arrow_udf-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4a8fc2c84589858c0be392a4b20c85765a09bf88f150c9f7d268af566332ac40
MD5 3d2c17d77d76718866708ef7bb5bda22
BLAKE2b-256 1a7f00860f9b089abe20d340a8731cdce2d3d354faf34129d6b22ebeea961557

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page