Skip to main content

File-based RPC for running Python functions across network-isolated nodes.

Project description

fileproxy

File-based RPC for running Python functions across network-isolated nodes.

Designed for HPC clusters where compute nodes lack internet access but share a filesystem with login nodes that do.

Installation

pip install fileproxy

Or from source:

git clone https://github.com/tboulet/fileproxy.git
cd fileproxy
pip install -e .

Quick Start

1. Define and start the server (login node)

Create a server script (example here with litellm.completion as the function to proxy):

# server_script.py
import fileproxy
import litellm

if __name__ == "__main__":
    fileproxy.run_server({
        "litellm_completion": litellm.completion,
    })

Run it on the login node:

python server_script.py

Tip: On HPC clusters, run the server in a persistent terminal session (e.g., TMUX) so it survives SSH disconnections. See guide_TMUX.md for a quick reference.

2. Use the proxy in your code (compute node)

import fileproxy

# Create a proxy that behaves like the original function
completion = fileproxy.proxy("litellm_completion")

# Use it exactly like litellm.completion
response = completion(model="gpt-4", messages=[{"role": "user", "content": "Hello"}])

The proxy serializes the arguments to a file, the server picks it up, runs the real function, and writes the result back. The proxy polls for the result and returns it.

Multiple Functions

Register multiple functions on the same server:

# Server
import fileproxy
import litellm
import requests

if __name__ == "__main__":
    fileproxy.run_server({
        "litellm_completion": litellm.completion,
        "http_post": requests.post,
        "http_get": requests.get,
    })
# Client
import fileproxy

completion = fileproxy.proxy("litellm_completion")
http_post = fileproxy.proxy("http_post")
http_get = fileproxy.proxy("http_get")

Configuration

Data directory

By default, fileproxy stores request/response files in ~/.cache/fileproxy/. Override with:

  1. Constructor argument: fileproxy.proxy("func", base_dir="/path/to/dir")
  2. Environment variable: export FILEPROXY_DIR=/path/to/dir

The server and client must use the same base directory on a shared filesystem.

Workers (parallel execution)

By default, the server processes requests sequentially. To handle multiple requests concurrently (useful when registering multiple functions or serving multiple clients):

# Process up to 4 requests in parallel
fileproxy.run_server(functions, workers=4)

With workers=1 (default), requests are executed one at a time. With workers=2 or more, requests are dispatched to a thread pool. This is particularly useful when mixing slow functions (e.g., LLM calls) with fast ones (e.g., HTTP requests) — a slow call won't block unrelated requests.

Note: Registered functions must be thread-safe when using workers > 1. Most common use cases (HTTP requests, API calls) are thread-safe.

Timeouts

# Client waits 10s for server acknowledgement (default: 10s)
func = fileproxy.proxy("my_func", no_server_timeout=15.0)

The timeout only applies while waiting for the server to acknowledge the request (pick it up). Once the server starts processing, the client waits indefinitely — slow functions will not cause false timeouts.

Poll interval

# Server checks for new requests every 0.5s (default: 0.2s)
fileproxy.run_server(functions, poll_interval=0.5)

# Client checks for response every 0.2s (default: 0.1s)
func = fileproxy.proxy("my_func", poll_interval=0.2)

How It Works

Compute Node (no internet)          Login Node (has internet)
─────────────────────────          ──────────────────────────

proxy("func")(args, kwargs)        Server polls input dir
  │                                  │
  ├─ Write request.pkl ──────────────┤
  │  to input dir                    ├─ Read request.pkl
  │                                  ├─ Create _started sentinel
  │  (client sees _started,          ├─ Call func(*args, **kwargs)
  │   disables timeout)              ├─ Write response.pkl (atomic)
  ├──────────────────────────────────┤  to output dir
  ├─ Read response.pkl              │
  ├─ Return result                  │

Directory structure

~/.cache/fileproxy/
├── func_name_1/
│   ├── input/       # Request files (.pkl)
│   └── output/      # Response files (.pkl) + _started sentinels
├── func_name_2/
│   ├── input/
│   └── output/
├── logs/
│   └── server_20260310_143000.log
└── server_heartbeat.json

Error Handling

fileproxy uses custom exception types to distinguish infrastructure errors from function errors:

import fileproxy
from fileproxy import FileProxyError, ServerNotRunningError

func = fileproxy.proxy("my_func")

try:
    result = func(args)
except ServerNotRunningError:
    # fileproxy infrastructure problem: server is not running
    print("Start the fileproxy server!")
except FileProxyError:
    # Other fileproxy infrastructure problem
    print("Something went wrong with the file proxy")
except ValueError:
    # Exception raised by the actual function on the server side
    # (re-raised with original type)
    print("The function itself failed")
  • FileProxyError: Base class for all fileproxy infrastructure errors.
  • ServerNotRunningError(FileProxyError): Server did not acknowledge the request within the timeout.
  • Server-side function exceptions are re-raised with their original type (not wrapped in FileProxyError).

Exception propagation details

When the proxied function raises an exception on the server, the proxy re-raises it on the client with the original exception type in most cases. For example, a server-side ValueError("bad input") becomes a client-side ValueError("bad input").

However, some exception classes have non-standard __init__ signatures that prevent Python's pickle from reconstructing them (e.g., litellm.RateLimitError requires llm_provider and model arguments). In these cases, the original exception cannot be faithfully reconstructed, so the proxy raises a RuntimeError instead, with a message of the form:

RuntimeError: Server-side RateLimitError: rate limited

In summary:

  • Standard exceptions (e.g., ValueError, TypeError, KeyError, most custom exceptions with a simple __init__(self, message) signature): re-raised with original type and message.
  • Non-picklable exceptions (non-standard __init__ that fails to round-trip through pickle): raised as RuntimeError("Server-side {OriginalType}: {original_message}").

Logs

Server logs are written to {base_dir}/logs/server_YYYYMMDDHHMMSS.log and also printed to the server terminal. Each log file corresponds to one server session.

Important Notes

Multiple servers

Do not run multiple fileproxy servers with the same base_dir. On startup, the server checks for an existing heartbeat and raises FileProxyError if another server appears to be running. To override and kill the old server, use force=True:

# force=True signals the old server to stop, waits for it to shut down,
# then starts the new server
fileproxy.run_server(functions, force=True)

If you need truly independent servers running simultaneously, use different base_dir values:

FILEPROXY_DIR=~/.cache/fileproxy-project-a python server_a.py
FILEPROXY_DIR=~/.cache/fileproxy-project-b python server_b.py

Restarting the server

When you restart the server, it clears all pending request/response files. Any client calls that were in-flight will eventually time out with ServerNotRunningError. This is by design — it prevents stale requests from a previous session from being processed.

Checking server status

From any node that shares the filesystem:

import fileproxy

info = fileproxy.status()
print(info["alive"])       # True/False
print(info["functions"])   # ["litellm_completion", "http_post", ...]
print(info["pid"])         # Server process ID
print(info["requests_processed"])  # Total requests handled

Safety Mechanisms

  • Atomic writes: Responses are written to a .tmp file then renamed, preventing clients from reading partial data.
  • Started sentinel: When the server begins processing a request, it creates a _started marker file. The client uses this to distinguish "server is processing (wait)" from "server is not running (fail fast)."
  • Exception propagation: If the function raises an exception on the server, the exception object is pickled and re-raised on the client side with its original type.
  • Unpicklable response handling: If the server cannot pickle the response (e.g., it contains open file handles), the client receives a FileProxyError instead of hanging.
  • Cleanup: Request, response, and sentinel files are removed after processing.
  • Startup cleanup: The server clears stale files from previous runs on startup.

Limitations

  • Arguments and return values must be picklable (most Python objects are — strings, dicts, lists, numbers, dataclasses, etc. Lambdas, open file handles, and generators are not).
  • Latency overhead of ~100-200ms per call due to filesystem polling.
  • Server and client must share a filesystem (e.g., NFS home directory on HPC clusters). Local-only filesystems like /tmp won't work across nodes.
  • If the server crashes (e.g., killed by OOM) while processing a request, the client will wait indefinitely for that request. Restart the server to recover.
  • If the server and client use different Python environments, server-side exceptions from libraries not installed on the client will be raised as RuntimeError instead of their original type.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fileproxy-0.1.0.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fileproxy-0.1.0-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file fileproxy-0.1.0.tar.gz.

File metadata

  • Download URL: fileproxy-0.1.0.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for fileproxy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 80aa53a834086d4012c5089dff9bb9533e30b7139582ebb8a139ea0d947dffa1
MD5 bd7397de559aa3c34d00c534c40d4d7d
BLAKE2b-256 5f288c6a8c4a697dc01237a60aa6f0e9bec6451399e24d23b54870fb23d6a93c

See more details on using hashes here.

File details

Details for the file fileproxy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fileproxy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for fileproxy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b13c1b96996fbec12bd11fc1746f5f1fb26b70d60b1ac35f719b481206e52e5b
MD5 d9506d35a2d699f17e591db4fe1e0899
BLAKE2b-256 6ed87b0e20895131452630c9e1731d8b8d004f8e0e2e83cf0dc85a0bca449198

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page