Skip to main content

File-based RPC for running Python functions across network-isolated nodes.

Project description

fileproxy

File-based RPC for running Python functions across network-isolated nodes.

Designed for HPC clusters where compute nodes lack internet access but share a filesystem with login nodes that do.

Installation

pip install fileproxy

Or from source:

git clone https://github.com/tboulet/fileproxy.git
cd fileproxy
pip install -e .

Quick Start

1. Define and start the server (login node)

Create a server script (example here with litellm.completion as the function to proxy):

# server_script.py
import fileproxy
import litellm

if __name__ == "__main__":
    fileproxy.run_server({
        "litellm_completion": litellm.completion,
    })

Run it on the login node:

python server_script.py

Tip: On HPC clusters, run the server in a persistent terminal session (e.g., TMUX) so it survives SSH disconnections. See guide_TMUX.md for a quick reference.

2. Use the proxy in your code (compute node)

import fileproxy

# Create a proxy that behaves like the original function
completion = fileproxy.proxy("litellm_completion")

# Use it exactly like litellm.completion
response = completion(model="gpt-4", messages=[{"role": "user", "content": "Hello"}])

The proxy serializes the arguments to a file, the server picks it up, runs the real function, and writes the result back. The proxy polls for the result and returns it.

Multiple Functions

Register multiple functions on the same server:

# Server
import fileproxy
import litellm
import requests

if __name__ == "__main__":
    fileproxy.run_server({
        "litellm_completion": litellm.completion,
        "http_post": requests.post,
        "http_get": requests.get,
    })
# Client
import fileproxy

completion = fileproxy.proxy("litellm_completion")
http_post = fileproxy.proxy("http_post")
http_get = fileproxy.proxy("http_get")

Configuration

Data directory

By default, fileproxy stores request/response files in ~/.cache/fileproxy/. Override with:

  1. Constructor argument: fileproxy.proxy("func", base_dir="/path/to/dir")
  2. Environment variable: export FILEPROXY_DIR=/path/to/dir

The server and client must use the same base directory on a shared filesystem.

Workers (parallel execution)

By default, the server processes requests sequentially. To handle multiple requests concurrently (useful when registering multiple functions or serving multiple clients):

# Process up to 4 requests in parallel
fileproxy.run_server(functions, workers=4)

With workers=1 (default), requests are executed one at a time. With workers=2 or more, requests are dispatched to a thread pool. This is particularly useful when mixing slow functions (e.g., LLM calls) with fast ones (e.g., HTTP requests) — a slow call won't block unrelated requests.

Note: Registered functions must be thread-safe when using workers > 1. Most common use cases (HTTP requests, API calls) are thread-safe.

Timeouts

# Client waits 10s for server acknowledgement (default: 10s)
func = fileproxy.proxy("my_func", no_server_timeout=15.0)

The timeout only applies while waiting for the server to acknowledge the request (pick it up). Once the server starts processing, the client waits indefinitely — slow functions will not cause false timeouts.

Poll interval

# Server checks for new requests every 0.5s (default: 0.2s)
fileproxy.run_server(functions, poll_interval=0.5)

# Client checks for response every 0.2s (default: 0.1s)
func = fileproxy.proxy("my_func", poll_interval=0.2)

How It Works

Compute Node (no internet)          Login Node (has internet)
─────────────────────────          ──────────────────────────

proxy("func")(args, kwargs)        Server polls input dir
  │                                  │
  ├─ Write request.pkl ──────────────┤
  │  to input dir                    ├─ Read request.pkl
  │                                  ├─ Create _started sentinel
  │  (client sees _started,          ├─ Call func(*args, **kwargs)
  │   disables timeout)              ├─ Write response.pkl (atomic)
  ├──────────────────────────────────┤  to output dir
  ├─ Read response.pkl              │
  ├─ Return result                  │

Directory structure

~/.cache/fileproxy/
├── func_name_1/
│   ├── input/       # Request files (.pkl)
│   └── output/      # Response files (.pkl) + _started sentinels
├── func_name_2/
│   ├── input/
│   └── output/
├── logs/
│   └── server_20260310_143000.log
└── server_heartbeat.json

Error Handling

fileproxy uses custom exception types to distinguish infrastructure errors from function errors:

import fileproxy
from fileproxy import FileProxyError, ServerNotRunningError

func = fileproxy.proxy("my_func")

try:
    result = func(args)
except ServerNotRunningError:
    # fileproxy infrastructure problem: server is not running
    print("Start the fileproxy server!")
except FileProxyError:
    # Other fileproxy infrastructure problem
    print("Something went wrong with the file proxy")
except ValueError:
    # Exception raised by the actual function on the server side
    # (re-raised with original type)
    print("The function itself failed")
  • FileProxyError: Base class for all fileproxy infrastructure errors.
  • ServerNotRunningError(FileProxyError): Server did not acknowledge the request within the timeout.
  • Server-side function exceptions are re-raised with their original type (not wrapped in FileProxyError).

Exception propagation details

When the proxied function raises an exception on the server, the proxy re-raises it on the client with the original exception type in most cases. For example, a server-side ValueError("bad input") becomes a client-side ValueError("bad input").

However, some exception classes have non-standard __init__ signatures that prevent Python's pickle from reconstructing them (e.g., litellm.RateLimitError requires llm_provider and model arguments). In these cases, the original exception cannot be faithfully reconstructed, so the proxy raises a RuntimeError instead, with a message of the form:

RuntimeError: Server-side RateLimitError: rate limited

In summary:

  • Standard exceptions (e.g., ValueError, TypeError, KeyError, most custom exceptions with a simple __init__(self, message) signature): re-raised with original type and message.
  • Non-picklable exceptions (non-standard __init__ that fails to round-trip through pickle): raised as RuntimeError("Server-side {OriginalType}: {original_message}").

Logs

Server logs are written to {base_dir}/logs/server_YYYYMMDDHHMMSS.log and also printed to the server terminal. Each log file corresponds to one server session.

Important Notes

Multiple servers

Do not run multiple fileproxy servers with the same base_dir. On startup, the server checks for an existing heartbeat and raises FileProxyError if another server appears to be running. To override and kill the old server, use force=True:

# force=True signals the old server to stop, waits for it to shut down,
# then starts the new server
fileproxy.run_server(functions, force=True)

If you need truly independent servers running simultaneously, use different base_dir values:

FILEPROXY_DIR=~/.cache/fileproxy-project-a python server_a.py
FILEPROXY_DIR=~/.cache/fileproxy-project-b python server_b.py

Restarting the server

When you restart the server, it clears all pending request/response files. Any client calls that were in-flight will eventually time out with ServerNotRunningError. This is by design — it prevents stale requests from a previous session from being processed.

Checking server status

From any node that shares the filesystem:

import fileproxy

info = fileproxy.status()
print(info["alive"])       # True/False
print(info["functions"])   # ["litellm_completion", "http_post", ...]
print(info["pid"])         # Server process ID
print(info["requests_processed"])  # Total requests handled

Safety Mechanisms

  • Atomic writes: Responses are written to a .tmp file then renamed, preventing clients from reading partial data.
  • Started sentinel: When the server begins processing a request, it creates a _started marker file. The client uses this to distinguish "server is processing (wait)" from "server is not running (fail fast)."
  • Exception propagation: If the function raises an exception on the server, the exception object is pickled and re-raised on the client side with its original type.
  • Unpicklable response handling: If the server cannot pickle the response (e.g., it contains open file handles), the client receives a FileProxyError instead of hanging.
  • Cleanup: Request, response, and sentinel files are removed after processing.
  • Startup cleanup: The server clears stale files from previous runs on startup.

Limitations

  • Arguments and return values must be picklable (most Python objects are — strings, dicts, lists, numbers, dataclasses, etc. Lambdas, open file handles, and generators are not).
  • Latency overhead of ~100-200ms per call due to filesystem polling.
  • Server and client must share a filesystem (e.g., NFS home directory on HPC clusters). Local-only filesystems like /tmp won't work across nodes.
  • If the server crashes (e.g., killed by OOM) while processing a request, the client will wait indefinitely for that request. Restart the server to recover.
  • If the server and client use different Python environments, server-side exceptions from libraries not installed on the client will be raised as RuntimeError instead of their original type.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fileproxy-0.1.1.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fileproxy-0.1.1-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file fileproxy-0.1.1.tar.gz.

File metadata

  • Download URL: fileproxy-0.1.1.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for fileproxy-0.1.1.tar.gz
Algorithm Hash digest
SHA256 18f07100f3efe451e2f6f370b23557d45fe310e901c4677a716ee7c1229e486b
MD5 87f974fe68559019d486259f1aecdcb3
BLAKE2b-256 cabbf7dd77acd2a6752fe822f8157b200b9a0e1c3a991dd2e48bc9e8d9107af2

See more details on using hashes here.

File details

Details for the file fileproxy-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: fileproxy-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for fileproxy-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ce81e70ce486680d3de45ccdd6a00947c1325d18498470f7aefe617e39785787
MD5 0d87ee21649ac52d539ee78a5b3858f4
BLAKE2b-256 aabf74451144bf81c196d4805ffc8eb5c6e84e08885d0d2224cafef0f552fb4e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page