Skip to main content

Deploy and access image and data processing models across processes.

Project description

Nahual: Communication layer to send and transform data across environments and/or processes.

The problem: When trying to train, compare and deploy many different models (deep learning or otherwise), the number of dependencies in one Python environment can get out of control very quickly (e.g., one model requires PyTorch 2.1 and another one 2.7).

Potential solution: I figured that if we can move parameters and numpy arrays between environments, we can isolate each model and having them process our data on-demand.

Thus the goal of this tool is provide a way to deploy model(s) in one (or many) environments, and access them from another one, usually an orchestrator.

Available models and tools

By default, the models and tools are deployable using Nix.

  • BABY: Segmentation, tracking and lineage assignment for budding yeast.
  • Cellpose: Generalist segmentation model.
  • DINOv2: Generalist self-supervised model to obtain visual features.
  • Trackastra: Transformer-based tracking trained on a multitude of datasets.

WIP

  • DINOv3: Generalist self-supervised model, latest iteration.

Usage

Step 1: Deploy server

cd to the model you want to deploy. In this case we will test the image embedding model DINOv2.

git clone https://github.com/afermg/dinov2.git
cd dinov2
nix develop --command bash -c "python server.py ipc:///tmp/dinov2.ipc"

Step 2: Run client

Once the server is running, you can call it from a different python script.

import numpy

from nahual.process import dispatch_setup_process

setup, process = dispatch_setup_process("dinov2")
address = "ipc:///tmp/dinov2.ipc"

# %%Load models server-side
parameters = {"repo_or_dir": "facebookresearch/dinov2", "model": "dinov2_vits14_lc"}
response = setup(parameters, address=address)

# %% Define custom data
data = numpy.random.random_sample((1, 3, 420, 420))
result = process(data + 1000, address=address)

You can press C-c C-c from the terminal where the server lives to kill it. We will also add a way to kill the server from within the client.

Design decisions and details

I strive to be as lean as possible (both in dependency count and architectural complexity), it is designed around three layers:

  • Server deployment: A collection of functions/tool (we could even call it a "model zoo" if we are trying to sound cool) that we may want to use, (e.g., Cellpose for object segmentation or Trackastra for tracking).
  • Transport layer: We need to move the data between environments. I also wrote my own (trivially simple) numpy serializer. Since we have Python at both ends of the connection, we can reuse these functions server-side.
  • Orchestration: This can be a script, or my own pipelining framework aliby, massages the data into the desired shape/type, and then hands it over to nahual.

This tool is my personal one-stop-shop source for multiple models to process imaging data or their derivatives. Please note that this is work in progress, and very likely to undergo major changes as I understand the core challenges.

To reduce maintenance burden, we support only the necessary data types:

  • Dictionaries: To send parameters to deploy and evaluate models/functions.
  • Numpy arrays (and numpy-able lists/tuples): The main type of data we deal with.

Tech stack

  • Model/tool deployment I use Nix, and at the moment do not plan to support containers. The logic behind gives me unique guarantees of reproducibility, whilst allowing me to use bleeding edge models and libraries.
  • Transport layer I use pynng, I like that it is very minimalistic and provides easy-to-reproduce [[https://github.com/codypiersall/pynng/tree/7fd3d76573c3cb40c1e5f7e10d4a6091e411b9c2/examples][examples]]. An alternative would have been gRPC + protobuf, but since I am trying to understand the constraints and tradeoffs I do not want to commit to a big framework unless I have a compelling reason to do so.

Adding support for new models

Any model requires a thin layer that communicates using nng. You can see an example of trackastra's server and client.

Roadmap

  • Support multiple instances of a model loaded on memory server-side.
  • Formalize supported packet formats: (e.g., numpy arrays, dictionary).
  • Increase number of supported models/methods.
  • Document server-side API.
  • Integrate into the aliby pipelining framework, in a way that is agnostic to which model is being used.
  • Support containers that wrap the Nix derivations.

Why nahual?

In Mesoamerican folklore, a Nahual is a shaman able to transform into different animals.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nahual-0.0.6.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nahual-0.0.6-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file nahual-0.0.6.tar.gz.

File metadata

  • Download URL: nahual-0.0.6.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.22

File hashes

Hashes for nahual-0.0.6.tar.gz
Algorithm Hash digest
SHA256 7b9bfa3756c0a968513e8fa56e0dbedf494e9d0e67e6e57e5b3a590829a51d57
MD5 0722222fa862d2a02474fe2a7f1adcca
BLAKE2b-256 841a60585ae6bf4b91c654afe6cf59860ae8f4f31c62d522f2ca41c8fb1a382d

See more details on using hashes here.

File details

Details for the file nahual-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: nahual-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 10.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.22

File hashes

Hashes for nahual-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 2b33d20b80536c9b0c971bfc486d5bc0343c54a7502db014d9b1deda3974ec5d
MD5 830d0c944ae83878e582ce02dc50591c
BLAKE2b-256 b2f4b631f42359df1d46db1970158c8d1c572d500aa7ed426c15e91f541bf81c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page