torch-nos

Nitrous oxide system (NOS) for PyTorch.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Nitrous Oxide for your AI Infrastructure

Optimizing and serving models for production AI inference is still difficult, often leading to notoriously expensive cloud bills and often underutilized GPUs. That’s why we’re building NOS - a fast and flexible inference server for modern AI workloads. With a few lines of code, developers can optimize, serve, and auto-scale Pytorch model inference without having to deal with the complexities of ML compilers, HW-accelerators, or distributed inference. Simply put, NOS allows AI teams to cut inference costs up to 10x, speeding up development time and time-to-market.

⚡️ What is NOS?

NOS (torch-nos) is a fast and flexible Pytorch inference server, specifically designed for optimizing and running lightning-fast inference of popular foundational AI models.

👩‍💻 Easy-to-use: Built for PyTorch and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.
🥷 Flexible: Run and serve several foundational AI models (Stable Diffusion, CLIP, Whisper) in a single place.
🔌 Pluggable: Plug your front-end to NOS with out-of-the-box high-performance gRPC/REST APIs, avoiding all kinds of ML model deployment hassles.
🚀 Scalable: Optimize and scale models easily for maximum HW performance without a PhD in ML, distributed systems or infrastructure.
📦 Extensible: Easily hack and add custom models, optimizations, and HW-support in a Python-first environment.
⚙️ HW-accelerated: Take full advantage of your underlying HW (GPUs, ASICs) without compromise.
☁️ Cloud-agnostic: Run on any cloud HW (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.

NOS inherits its name from Nitrous Oxide System, the performance-enhancing system typically used in racing cars. NOS is designed to be modular and easy to extend.

🚀 Getting Started

Get started with the full NOS server by installing via pip:

$ conda env create -n nos-py38 python=3.8
$ conda activate nos-py38
$ conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
$ pip install torch-nos[server]

If you want to simply use a light-weight NOS client and run inference on your local machine, you can install the client-only package:

$ conda env create -n nos-py38 python=3.8
$ conda activate nos-py38
$ pip install torch-nos

🔥 Quickstart / Show me the code

Image Generation as-a-Service

REST API

gRPC API ⚡

curl \
-X POST http://localhost:8000/infer \
-H 'Content-Type: application/json' \
-d '{
      "model_id": "stabilityai/stable-diffusion-xl-base-1-0",
      "inputs": {
          "prompts": ["fox jumped over the moon"],
          "width": 1024,
          "height": 1024,
          "num_images": 1
      }
    }'

from nos.client import Client

client = Client("[::]:50051")

sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")
image, = sdxl(prompts=["fox jumped over the moon"],
              width=1024, height=1024, num_images=1)

Text & Image Embedding-as-a-Service (CLIP-as-a-Service)

REST API

gRPC API ⚡

curl \
-X POST http://localhost:8000/infer \
-H 'Content-Type: application/json' \
-d '{
      "model_id": "openai/clip",
      "method": "encode_text",
      "inputs": {
          "texts": ["fox jumped over the moon"]
      }
    }'

from nos.client import Client

client = Client("[::]:50051")

clip = client.Module("openai/clip")
txt_vec = clip.encode_text(text=["fox jumped over the moon"])

📂 Repository Structure

├── docker         # Dockerfile for CPU/GPU servers
├── docs           # mkdocs documentation
├── examples       # example guides, jupyter notebooks, demos
├── makefiles      # makefiles for building/testing
├── nos
│   ├── cli        # CLI (hub, system)
│   ├── client     # gRPC / REST client
│   ├── common     # common utilities
│   ├── executors  # runtime executor (i.e. Ray)
│   ├── hub        # hub utilies
│   ├── managers   # model manager / multiplexer
│   ├── models     # model zoo
│   ├── proto      # protobuf defs for NOS gRPC service
│   ├── server     # server backend (gRPC)
│   └── test       # pytest utilities
├── requirements   # requirement extras (server, docs, tests)
├── scripts        # basic scripts
└── tests          # pytests (client, server, benchmark)

📚 Documentation

Quickstart
Models
Concepts: NOS Architecture
Demos: Building a Discord Image Generation Bot, Video Search Demo

🛣 Roadmap

HW / Cloud Support

Commodity GPUs
- NVIDIA GPUs (20XX, 30XX, 40XX)
- AMD GPUs (RX 7000)
Cloud GPUs
- NVIDIA (H100, A100, A10G, A30G, T4, L4)
- AMD (MI200, MI250)
Cloud Service Providers (via SkyPilot)
- Big 3: AWS, GCP, Azure
- Opinionated Cloud: Lambda Labs, RunPod, etc
Cloud ASICs
- AWS Inferentia (Inf1/Inf2)
- Google TPU
- Coming soon! (Habana Gaudi, Tenstorrent)

📄 License

This project is licensed under the Apache-2.0 License.

🤝 Contributing

We welcome contributions! Please see our contributing guide for more information.

🔗 Quick Links

💬 Send us an email at support@autonomi.ai or join our Discord for help.
📣 Follow us on Twitter, and LinkedIn to keep up-to-date on our products.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.3.0

May 4, 2024

0.2.0

Feb 1, 2024

0.2.0b0 pre-release

Apr 11, 2024

0.2.0a0 pre-release

Feb 21, 2024

0.1.5

Jan 18, 2024

0.1.4

Jan 7, 2024

0.1.3

Jan 1, 2024

0.1.2

Dec 14, 2023

0.1.1

Dec 13, 2023

0.1.0

Nov 8, 2023

0.1.0rc3 pre-release

Nov 8, 2023

0.1.0rc2 pre-release

Oct 30, 2023

This version

0.1.0rc1 pre-release

Oct 23, 2023

0.0.10

Sep 14, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

torch_nos-0.1.0rc1-py3-none-any.whl (1.7 MB view hashes)

Uploaded Oct 23, 2023 Python 3

Hashes for torch_nos-0.1.0rc1-py3-none-any.whl

Hashes for torch_nos-0.1.0rc1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`862e785090bfc19c28bffbd8455515df895b1f08633414d4ce7278acfd507267`
MD5	`6e3d6cb918f64601648391c6d3c41861`
BLAKE2b-256	`ba6d0a7d0a4ff92d89788658d0d142e370e09626756eb9a9f34c55ce74f96751`