A user-defined function framework for Apache Arrow
Project description
Arrow UDF Python Server
Installation
pip install arrow-udf
Usage
Define functions in a Python file:
# udf.py
from arrow_udf import udf, udtf, UdfServer
import struct
import socket
# Define a scalar function
@udf(input_types=['INT', 'INT'], result_type='INT')
def gcd(x, y):
while y != 0:
(x, y) = (y, x % y)
return x
# Define a scalar function that returns multiple values (within a struct)
@udf(input_types=['BINARY'], result_type='STRUCT<src_addr: STRING, dst_addr: STRING, src_port: INT16, dst_port: INT16>')
def extract_tcp_info(tcp_packet: bytes):
src_addr, dst_addr = struct.unpack('!4s4s', tcp_packet[12:20])
src_port, dst_port = struct.unpack('!HH', tcp_packet[20:24])
src_addr = socket.inet_ntoa(src_addr)
dst_addr = socket.inet_ntoa(dst_addr)
return {
'src_addr': src_addr,
'dst_addr': dst_addr,
'src_port': src_port,
'dst_port': dst_port,
}
# Define a table function
@udtf(input_types='INT', result_types='INT')
def series(n):
for i in range(n):
yield i
# Start a UDF server
if __name__ == '__main__':
server = UdfServer(location="0.0.0.0:8815")
server.add_function(gcd)
server.add_function(extract_tcp_info)
server.add_function(series)
server.serve()
Start the UDF server:
python3 udf.py
Data Types
| Arrow Type | Python Type |
|---|---|
boolean |
bool |
int8 |
int |
int16 |
int |
int32 |
int |
int64 |
int |
uint8 |
int |
uint16 |
int |
uint32 |
int |
uint64 |
int |
float32 |
float |
float32 |
float |
date32 |
datetime.date |
time64 |
datetime.time |
timestamp |
datetime.datetime |
interval |
MonthDayNano / (int, int, int) (fields can be obtained by months(), days() and nanoseconds() from MonthDayNano) |
string |
str |
binary |
bytes |
large_string |
str |
large_binary |
bytes |
Extension types:
| Data type | Metadata | Python Type |
|---|---|---|
decimal |
arrowudf.decimal |
decimal.Decimal |
json |
arrowudf.json |
any |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
arrow_udf-0.2.2.tar.gz
(10.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
arrow_udf-0.2.2-py3-none-any.whl
(10.8 kB
view details)
File details
Details for the file arrow_udf-0.2.2.tar.gz.
File metadata
- Download URL: arrow_udf-0.2.2.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0089a6dcd7701d447055497ca8e426adc2b16af30d55951fd4f11e7b9c4dbaa6
|
|
| MD5 |
f613415729bf293aef7a0f0f006cfb15
|
|
| BLAKE2b-256 |
2009ac5806060d1d36a887dfcb0cab0614771bcb46b8845d298dd93959c99f78
|
File details
Details for the file arrow_udf-0.2.2-py3-none-any.whl.
File metadata
- Download URL: arrow_udf-0.2.2-py3-none-any.whl
- Upload date:
- Size: 10.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cfaeb772629f7853d5140134c322b91b2c5ac2b9e528c945b8c87dfa55bbc2e
|
|
| MD5 |
20263ab4dfaf9b8d377bded13efe65ed
|
|
| BLAKE2b-256 |
33b22029b0255f3e22bbc27fbb0c275939f9ec14fa52807888d06eda6f04d278
|