Skip to main content

A user-defined function framework for Apache Arrow

Project description

Arrow UDF Python Server

Installation

pip install arrow-udf

Usage

Define functions in a Python file:

# udf.py
from arrow_udf import udf, udtf, UdfServer
import struct
import socket

# Define a scalar function
@udf(input_types=['INT', 'INT'], result_type='INT')
def gcd(x, y):
    while y != 0:
        (x, y) = (y, x % y)
    return x

# Define a scalar function that returns multiple values (within a struct)
@udf(input_types=['BINARY'], result_type='STRUCT<src_addr: STRING, dst_addr: STRING, src_port: INT16, dst_port: INT16>')
def extract_tcp_info(tcp_packet: bytes):
    src_addr, dst_addr = struct.unpack('!4s4s', tcp_packet[12:20])
    src_port, dst_port = struct.unpack('!HH', tcp_packet[20:24])
    src_addr = socket.inet_ntoa(src_addr)
    dst_addr = socket.inet_ntoa(dst_addr)
    return {
        'src_addr': src_addr,
        'dst_addr': dst_addr,
        'src_port': src_port,
        'dst_port': dst_port,
    }

# Define a table function
@udtf(input_types='INT', result_types='INT')
def series(n):
    for i in range(n):
        yield i

# Start a UDF server
if __name__ == '__main__':
    server = UdfServer(location="0.0.0.0:8815")
    server.add_function(gcd)
    server.add_function(extract_tcp_info)
    server.add_function(series)
    server.serve()

Start the UDF server:

python3 udf.py

Data Types

Arrow Type Python Type
boolean bool
int8 int
int16 int
int32 int
int64 int
uint8 int
uint16 int
uint32 int
uint64 int
float32 float
float32 float
date32 datetime.date
time64 datetime.time
timestamp datetime.datetime
interval MonthDayNano / (int, int, int) (fields can be obtained by months(), days() and nanoseconds() from MonthDayNano)
string str
binary bytes
large_string str
large_binary bytes

Extension types:

Data type Metadata Python Type
decimal arrowudf.decimal decimal.Decimal
json arrowudf.json any

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arrow_udf-0.2.1.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

arrow_udf-0.2.1-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file arrow_udf-0.2.1.tar.gz.

File metadata

  • Download URL: arrow_udf-0.2.1.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for arrow_udf-0.2.1.tar.gz
Algorithm Hash digest
SHA256 3892aa478b5e81383511d1f70a57ae2eaccfb0dbb2a6a55cc86281c0bef4fcd6
MD5 1ecb851be350c470fe7220840c68e8d2
BLAKE2b-256 05f7a4aa3ac575e229937dec67f28631fea455d1b467b7679ddc783e0aba34f0

See more details on using hashes here.

File details

Details for the file arrow_udf-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: arrow_udf-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for arrow_udf-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a31a975ce97698152012ac2ee073cf77c55bbd513a7a41d5c4325a9565753703
MD5 afed9fe69c42079eefa5eb851b54c8e2
BLAKE2b-256 1ea3e61a63bd3032b55ba80b2cbfdc179ee033ac11bf2cdc947ec491d6729cfc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page