File-based RPC for running Python functions across network-isolated nodes.
Project description
fileproxy
File-based RPC for running Python functions across network-isolated nodes.
Designed for HPC clusters where compute nodes lack internet access but share a filesystem with login nodes that do.
Installation
pip install fileproxy
Or from source:
git clone https://github.com/tboulet/fileproxy.git
cd fileproxy
pip install -e .
Quick Start
1. Define and start the server (login node)
Create a server script (example here with litellm.completion as the function to proxy):
# server_script.py
import fileproxy
import litellm
if __name__ == "__main__":
fileproxy.run_server({
"litellm_completion": litellm.completion,
})
Run it on the login node:
python server_script.py
Tip: On HPC clusters, run the server in a persistent terminal session (e.g., TMUX) so it survives SSH disconnections. See guide_TMUX.md for a quick reference.
2. Use the proxy in your code (compute node)
import fileproxy
# Create a proxy that behaves like the original function
completion = fileproxy.proxy("litellm_completion")
# Use it exactly like litellm.completion
response = completion(model="gpt-4", messages=[{"role": "user", "content": "Hello"}])
The proxy serializes the arguments to a file, the server picks it up, runs the real function, and writes the result back. The proxy polls for the result and returns it.
Multiple Functions
Register multiple functions on the same server:
# Server
import fileproxy
import litellm
import requests
if __name__ == "__main__":
fileproxy.run_server({
"litellm_completion": litellm.completion,
"http_post": requests.post,
"http_get": requests.get,
})
# Client
import fileproxy
completion = fileproxy.proxy("litellm_completion")
http_post = fileproxy.proxy("http_post")
http_get = fileproxy.proxy("http_get")
Configuration
Data directory
By default, fileproxy stores request/response files in ~/.cache/fileproxy/. Override with:
- Constructor argument:
fileproxy.proxy("func", base_dir="/path/to/dir") - Environment variable:
export FILEPROXY_DIR=/path/to/dir
The server and client must use the same base directory on a shared filesystem.
Workers (parallel execution)
By default, the server processes requests sequentially. To handle multiple requests concurrently (useful when registering multiple functions or serving multiple clients):
# Process up to 4 requests in parallel
fileproxy.run_server(functions, workers=4)
With workers=1 (default), requests are executed one at a time. With workers=2 or more, requests are dispatched to a thread pool. This is particularly useful when mixing slow functions (e.g., LLM calls) with fast ones (e.g., HTTP requests) — a slow call won't block unrelated requests.
Note: Registered functions must be thread-safe when using
workers > 1. Most common use cases (HTTP requests, API calls) are thread-safe.
Timeouts
# Client waits 10s for server acknowledgement (default: 10s)
func = fileproxy.proxy("my_func", no_server_timeout=15.0)
The timeout only applies while waiting for the server to acknowledge the request (pick it up). Once the server starts processing, the client waits indefinitely — slow functions will not cause false timeouts.
Poll interval
# Server checks for new requests every 0.5s (default: 0.2s)
fileproxy.run_server(functions, poll_interval=0.5)
# Client checks for response every 0.2s (default: 0.1s)
func = fileproxy.proxy("my_func", poll_interval=0.2)
How It Works
Compute Node (no internet) Login Node (has internet)
───────────────────────── ──────────────────────────
proxy("func")(args, kwargs) Server polls input dir
│ │
├─ Write request.pkl ──────────────┤
│ to input dir ├─ Read request.pkl
│ ├─ Create _started sentinel
│ (client sees _started, ├─ Call func(*args, **kwargs)
│ disables timeout) ├─ Write response.pkl (atomic)
├──────────────────────────────────┤ to output dir
├─ Read response.pkl │
├─ Return result │
Directory structure
~/.cache/fileproxy/
├── func_name_1/
│ ├── input/ # Request files (.pkl)
│ └── output/ # Response files (.pkl) + _started sentinels
├── func_name_2/
│ ├── input/
│ └── output/
├── logs/
│ └── server_20260310_143000.log
└── server_heartbeat.json
Error Handling
fileproxy uses custom exception types to distinguish infrastructure errors from function errors:
import fileproxy
from fileproxy import FileProxyError, ServerNotRunningError
func = fileproxy.proxy("my_func")
try:
result = func(args)
except ServerNotRunningError:
# fileproxy infrastructure problem: server is not running
print("Start the fileproxy server!")
except FileProxyError:
# Other fileproxy infrastructure problem
print("Something went wrong with the file proxy")
except ValueError:
# Exception raised by the actual function on the server side
# (re-raised with original type)
print("The function itself failed")
FileProxyError: Base class for all fileproxy infrastructure errors.ServerNotRunningError(FileProxyError): Server did not acknowledge the request within the timeout.- Server-side function exceptions are re-raised with their original type (not wrapped in
FileProxyError).
Exception propagation details
When the proxied function raises an exception on the server, the proxy re-raises it on the client with the original exception type in most cases. For example, a server-side ValueError("bad input") becomes a client-side ValueError("bad input").
However, some exception classes have non-standard __init__ signatures that prevent Python's pickle from reconstructing them (e.g., litellm.RateLimitError requires llm_provider and model arguments). In these cases, the original exception cannot be faithfully reconstructed, so the proxy raises a RuntimeError instead, with a message of the form:
RuntimeError: Server-side RateLimitError: rate limited
In summary:
- Standard exceptions (e.g.,
ValueError,TypeError,KeyError, most custom exceptions with a simple__init__(self, message)signature): re-raised with original type and message. - Non-picklable exceptions (non-standard
__init__that fails to round-trip through pickle): raised asRuntimeError("Server-side {OriginalType}: {original_message}").
Logs
Server logs are written to {base_dir}/logs/server_YYYYMMDDHHMMSS.log and also printed to the server terminal. Each log file corresponds to one server session.
Important Notes
Multiple servers
Do not run multiple fileproxy servers with the same base_dir. On startup, the server checks for an existing heartbeat and raises FileProxyError if another server appears to be running. To override and kill the old server, use force=True:
# force=True signals the old server to stop, waits for it to shut down,
# then starts the new server
fileproxy.run_server(functions, force=True)
If you need truly independent servers running simultaneously, use different base_dir values:
FILEPROXY_DIR=~/.cache/fileproxy-project-a python server_a.py
FILEPROXY_DIR=~/.cache/fileproxy-project-b python server_b.py
Restarting the server
When you restart the server, it clears all pending request/response files. Any client calls that were in-flight will eventually time out with ServerNotRunningError. This is by design — it prevents stale requests from a previous session from being processed.
Checking server status
From any node that shares the filesystem:
import fileproxy
info = fileproxy.status()
print(info["alive"]) # True/False
print(info["functions"]) # ["litellm_completion", "http_post", ...]
print(info["pid"]) # Server process ID
print(info["requests_processed"]) # Total requests handled
Safety Mechanisms
- Atomic writes: Responses are written to a
.tmpfile then renamed, preventing clients from reading partial data. - Started sentinel: When the server begins processing a request, it creates a
_startedmarker file. The client uses this to distinguish "server is processing (wait)" from "server is not running (fail fast)." - Exception propagation: If the function raises an exception on the server, the exception object is pickled and re-raised on the client side with its original type.
- Unpicklable response handling: If the server cannot pickle the response (e.g., it contains open file handles), the client receives a
FileProxyErrorinstead of hanging. - Cleanup: Request, response, and sentinel files are removed after processing.
- Startup cleanup: The server clears stale files from previous runs on startup.
Limitations
- Arguments and return values must be picklable (most Python objects are — strings, dicts, lists, numbers, dataclasses, etc. Lambdas, open file handles, and generators are not).
- Latency overhead of ~100-200ms per call due to filesystem polling.
- Server and client must share a filesystem (e.g., NFS home directory on HPC clusters). Local-only filesystems like
/tmpwon't work across nodes. - If the server crashes (e.g., killed by OOM) while processing a request, the client will wait indefinitely for that request. Restart the server to recover.
- If the server and client use different Python environments, server-side exceptions from libraries not installed on the client will be raised as
RuntimeErrorinstead of their original type.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fileproxy-0.1.0.tar.gz.
File metadata
- Download URL: fileproxy-0.1.0.tar.gz
- Upload date:
- Size: 20.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80aa53a834086d4012c5089dff9bb9533e30b7139582ebb8a139ea0d947dffa1
|
|
| MD5 |
bd7397de559aa3c34d00c534c40d4d7d
|
|
| BLAKE2b-256 |
5f288c6a8c4a697dc01237a60aa6f0e9bec6451399e24d23b54870fb23d6a93c
|
File details
Details for the file fileproxy-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fileproxy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b13c1b96996fbec12bd11fc1746f5f1fb26b70d60b1ac35f719b481206e52e5b
|
|
| MD5 |
d9506d35a2d699f17e591db4fe1e0899
|
|
| BLAKE2b-256 |
6ed87b0e20895131452630c9e1731d8b8d004f8e0e2e83cf0dc85a0bca449198
|