Nitrous Oxide for your AI Infrastructure.
Project description
โก๏ธ What is NOS?
NOS (torch-nos
) is a fast and flexible Pytorch inference server, specifically designed for optimizing and running inference of popular foundational AI models.
- ๐ฉโ๐ป Easy-to-use: Built for PyTorch and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.
- ๐ฅท Flexible: Run and serve several foundational AI models (Stable Diffusion, CLIP, Whisper) in a single place.
- ๐ Pluggable: Plug your front-end to NOS with out-of-the-box high-performance gRPC/REST APIs, avoiding all kinds of ML model deployment hassles.
- ๐ Scalable: Optimize and scale models easily for maximum HW performance without a PhD in ML, distributed systems or infrastructure.
- ๐ฆ Extensible: Easily hack and add custom models, optimizations, and HW-support in a Python-first environment.
- โ๏ธ HW-accelerated: Take full advantage of your underlying HW (GPUs, ASICs) without compromise.
- โ๏ธ Cloud-agnostic: Run on any cloud HW (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.
NOS inherits its name from Nitrous Oxide System, the performance-enhancing system typically used in racing cars. NOS is designed to be modular and easy to extend.
๐ Getting Started
Get started with the full NOS server by installing via pip:
$ conda env create -n nos-py38 python=3.8
$ conda activate nos-py38
$ conda install pytorch>=2.0.1 torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
$ pip install torch-nos[server]
If you want to simply use a light-weight NOS client and run inference on your local machine (via docker), you can install the client-only package:
$ conda env create -n nos-py38 python=3.8
$ conda activate nos-py38
$ pip install torch-nos
๐ฅ Quickstart / Show me the code
Image Generation as-a-Service
gRPC API โก | REST API |
from nos.client import Client
client = Client("[::]:50051")
sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")
image, = sdxl(prompts=["fox jumped over the moon"],
width=1024, height=1024, num_images=1)
|
curl \
-X POST http://localhost:8000/infer \
-H 'Content-Type: application/json' \
-d '{
"model_id": "stabilityai/stable-diffusion-xl-base-1-0",
"inputs": {
"prompts": ["fox jumped over the moon"],
"width": 1024,
"height": 1024,
"num_images": 1
}
}'
|
Text & Image Embedding-as-a-Service (CLIP-as-a-Service)
gRPC API โก | REST API |
from nos.client import Client
client = Client("[::]:50051")
clip = client.Module("openai/clip")
txt_vec = clip.encode_text(text=["fox jumped over the moon"])
|
curl \
-X POST http://localhost:8000/infer \
-H 'Content-Type: application/json' \
-d '{
"model_id": "openai/clip",
"method": "encode_text",
"inputs": {
"texts": ["fox jumped over the moon"]
}
}'
|
๐ Directory Structure
โโโ docker # Dockerfile for CPU/GPU servers
โโโ docs # mkdocs documentation
โโโ examples # example guides, jupyter notebooks, demos
โโโ makefiles # makefiles for building/testing
โโโ nos
โย ย โโโ cli # CLI (hub, system)
โย ย โโโ client # gRPC / REST client
โย ย โโโ common # common utilities
โย ย โโโ executors # runtime executor (i.e. Ray)
โย ย โโโ hub # hub utilies
โย ย โโโ managers # model manager / multiplexer
โย ย โโโ models # model zoo
โย ย โโโ proto # protobuf defs for NOS gRPC service
โย ย โโโ server # server backend (gRPC)
โย ย โโโ test # pytest utilities
โโโ requirements # requirement extras (server, docs, tests)
โโโ scripts # basic scripts
โโโ tests # pytests (client, server, benchmark)
๐ Documentation
- Quickstart
- Models
- Concepts: NOS Architecture
- Demos: Building a Discord Image Generation Bot, Video Search Demo
๐ฃ Roadmap
HW / Cloud Support
-
Commodity GPUs
- NVIDIA GPUs (20XX, 30XX, 40XX)
- AMD GPUs (RX 7000)
-
Cloud GPUs
- NVIDIA (H100, A100, A10G, A30G, T4, L4)
- AMD (MI200, MI250)
-
Cloud Service Providers (via SkyPilot)
- AWS, GCP, Azure
- Opinionated Cloud: Lambda Labs, RunPod, etc
-
Cloud ASICs
- AWS Inferentia (Inf1/Inf2)
- Google TPU
- Coming soon! (Habana Gaudi, Tenstorrent)
๐ License
This project is licensed under the Apache-2.0 License.
๐ก Telemetry
NOS collects anonymous usage data using Sentry. This is used to help us understand how the community is using NOS and to help us prioritize features. You can opt-out of telemetry by setting NOS_TELEMETRY_ENABLED=0
.
๐ค Contributing
We welcome contributions! Please see our contributing guide for more information.
๐ Quick Links
- ๐ฌ Send us an email at support@autonomi.ai or join our Discord for help.
- ๐ฃ Follow us on Twitter, and LinkedIn to keep up-to-date on our products.
<style> .md-typeset h1, .md-content__button { display: none; } </style>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for torch_nos-0.1.0rc2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3f845c0addaf1c6780e556ad2cc738c9b63e060db23754d70cf11bc5434dfcc |
|
MD5 | 3a75d76cc5899a330848f265b631aba7 |
|
BLAKE2b-256 | bada941a292495c106bb1cb57b71449127a421cef28968f37884001f0fd8f629 |