A user-defined function framework for Apache Arrow
Project description
Arrow UDF Python Server
Installation
pip install arrow-udf
Usage
Define functions in a Python file:
# udf.py
from arrow_udf import udf, udtf, UdfServer
import struct
import socket
# Define a scalar function
@udf(input_types=['INT', 'INT'], result_type='INT')
def gcd(x, y):
while y != 0:
(x, y) = (y, x % y)
return x
# Define a scalar function that returns multiple values (within a struct)
@udf(input_types=['BINARY'], result_type='STRUCT<src_addr: STRING, dst_addr: STRING, src_port: INT16, dst_port: INT16>')
def extract_tcp_info(tcp_packet: bytes):
src_addr, dst_addr = struct.unpack('!4s4s', tcp_packet[12:20])
src_port, dst_port = struct.unpack('!HH', tcp_packet[20:24])
src_addr = socket.inet_ntoa(src_addr)
dst_addr = socket.inet_ntoa(dst_addr)
return {
'src_addr': src_addr,
'dst_addr': dst_addr,
'src_port': src_port,
'dst_port': dst_port,
}
# Define a table function
@udtf(input_types='INT', result_types='INT')
def series(n):
for i in range(n):
yield i
# Start a UDF server
if __name__ == '__main__':
server = UdfServer(location="0.0.0.0:8815")
server.add_function(gcd)
server.add_function(extract_tcp_info)
server.add_function(series)
server.serve()
Start the UDF server:
python3 udf.py
Data Types
| Arrow Type | Python Type |
|---|---|
boolean |
bool |
int8 |
int |
int16 |
int |
int32 |
int |
int64 |
int |
uint8 |
int |
uint16 |
int |
uint32 |
int |
uint64 |
int |
float32 |
float |
float32 |
float |
date32 |
datetime.date |
time64 |
datetime.time |
timestamp |
datetime.datetime |
interval |
MonthDayNano / (int, int, int) (fields can be obtained by months(), days() and nanoseconds() from MonthDayNano) |
string |
str |
binary |
bytes |
large_string |
str |
large_binary |
bytes |
Extension types:
| Data type | Metadata | Python Type |
|---|---|---|
decimal |
arrowudf.decimal |
decimal.Decimal |
json |
arrowudf.json |
any |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
arrow_udf-0.3.0.tar.gz
(10.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
arrow_udf-0.3.0-py3-none-any.whl
(11.0 kB
view details)
File details
Details for the file arrow_udf-0.3.0.tar.gz.
File metadata
- Download URL: arrow_udf-0.3.0.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce554b600dcd0393f8aad360ee8b7de65f3238bb6166de9cb7a04318b6718769
|
|
| MD5 |
04e27aeb24391237b1aa50e32c1ab9af
|
|
| BLAKE2b-256 |
b4f4fe73f574cf587bb8e4ca4fca26f53213eb5f1aa082fea4cfc82a0be33786
|
File details
Details for the file arrow_udf-0.3.0-py3-none-any.whl.
File metadata
- Download URL: arrow_udf-0.3.0-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a8fc2c84589858c0be392a4b20c85765a09bf88f150c9f7d268af566332ac40
|
|
| MD5 |
3d2c17d77d76718866708ef7bb5bda22
|
|
| BLAKE2b-256 |
1a7f00860f9b089abe20d340a8731cdce2d3d354faf34129d6b22ebeea961557
|