A user-defined function framework for Apache Arrow
Project description
Arrow UDF Python Server
Installation
pip install arrow-udf
Usage
Define functions in a Python file:
# udf.py
from arrow_udf import udf, udtf, UdfServer
import struct
import socket
# Define a scalar function
@udf(input_types=['INT', 'INT'], result_type='INT')
def gcd(x, y):
while y != 0:
(x, y) = (y, x % y)
return x
# Define a scalar function that returns multiple values (within a struct)
@udf(input_types=['BINARY'], result_type='STRUCT<src_addr: STRING, dst_addr: STRING, src_port: INT16, dst_port: INT16>')
def extract_tcp_info(tcp_packet: bytes):
src_addr, dst_addr = struct.unpack('!4s4s', tcp_packet[12:20])
src_port, dst_port = struct.unpack('!HH', tcp_packet[20:24])
src_addr = socket.inet_ntoa(src_addr)
dst_addr = socket.inet_ntoa(dst_addr)
return {
'src_addr': src_addr,
'dst_addr': dst_addr,
'src_port': src_port,
'dst_port': dst_port,
}
# Define a table function
@udtf(input_types='INT', result_types='INT')
def series(n):
for i in range(n):
yield i
# Start a UDF server
if __name__ == '__main__':
server = UdfServer(location="0.0.0.0:8815")
server.add_function(gcd)
server.add_function(extract_tcp_info)
server.add_function(series)
server.serve()
Start the UDF server:
python3 udf.py
Data Types
| Arrow Type | Python Type |
|---|---|
boolean |
bool |
int8 |
int |
int16 |
int |
int32 |
int |
int64 |
int |
uint8 |
int |
uint16 |
int |
uint32 |
int |
uint64 |
int |
float32 |
float |
float32 |
float |
date32 |
datetime.date |
time64 |
datetime.time |
timestamp |
datetime.datetime |
interval |
MonthDayNano / (int, int, int) (fields can be obtained by months(), days() and nanoseconds() from MonthDayNano) |
string |
str |
binary |
bytes |
large_string |
str |
large_binary |
bytes |
Extension types:
| Data type | Metadata | Python Type |
|---|---|---|
decimal |
arrowudf.decimal |
decimal.Decimal |
json |
arrowudf.json |
any |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
arrow_udf-0.3.1.tar.gz
(10.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
arrow_udf-0.3.1-py3-none-any.whl
(11.0 kB
view details)
File details
Details for the file arrow_udf-0.3.1.tar.gz.
File metadata
- Download URL: arrow_udf-0.3.1.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0b808cee479e57eb402e66bdf7b03f2c5edee31d09b8a3a8ba8ff39d4ab8ed9
|
|
| MD5 |
90b184e0b08973957490a1b537dce42d
|
|
| BLAKE2b-256 |
21667ed7ee4c6808477a1ffc336dd299725cde3233963fdd6e136f483f41853b
|
File details
Details for the file arrow_udf-0.3.1-py3-none-any.whl.
File metadata
- Download URL: arrow_udf-0.3.1-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b98ccc7e74b0a2235d5145e9ebf87ce105bfcd2b70b3755af804f73a3eec645f
|
|
| MD5 |
c25ad60607f8a1c01619a7dbdd44edca
|
|
| BLAKE2b-256 |
787b43371fd612fc96a4ca45f579e9b745de616ae5e2e6510267c107045e05e3
|