Nitrous Oxide for your AI Infrastructure.
Project description
โก๏ธ What is NOS?
NOS (torch-nos
) is a fast and flexible Pytorch inference server, specifically designed for optimizing and running inference of popular foundational AI models.
- ๐ฉโ๐ป Easy-to-use: Built for PyTorch and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.
- ๐ฅท Flexible: Run and serve several foundational AI models (Stable Diffusion, CLIP, Whisper) in a single place.
- ๐ Pluggable: Plug your front-end to NOS with out-of-the-box high-performance gRPC/REST APIs, avoiding all kinds of ML model deployment hassles.
- ๐ Scalable: Optimize and scale models easily for maximum HW performance without a PhD in ML, distributed systems or infrastructure.
- ๐ฆ Extensible: Easily hack and add custom models, optimizations, and HW-support in a Python-first environment.
- โ๏ธ HW-accelerated: Take full advantage of your underlying HW (GPUs, ASICs) without compromise.
- โ๏ธ Cloud-agnostic: Run on any cloud HW (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.
NOS inherits its name from Nitrous Oxide System, the performance-enhancing system typically used in racing cars. NOS is designed to be modular and easy to extend.
๐ Getting Started
Get started with the full NOS server by installing via pip:
$ conda env create -n nos-py38 python=3.8
$ conda activate nos-py38
$ conda install pytorch>=2.0.1 torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
$ pip install torch-nos[server]
If you want to simply use a light-weight NOS client and run inference on your local machine (via docker), you can install the client-only package:
$ conda env create -n nos-py38 python=3.8
$ conda activate nos-py38
$ pip install torch-nos
๐ฅ Quickstart / Show me the code
Image Generation as-a-Service
gRPC API โก | REST API |
from nos.client import Client
client = Client("[::]:50051")
sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")
image, = sdxl(prompts=["fox jumped over the moon"],
width=1024, height=1024, num_images=1)
|
curl \
-X POST http://localhost:8000/infer \
-H 'Content-Type: application/json' \
-d '{
"model_id": "stabilityai/stable-diffusion-xl-base-1-0",
"inputs": {
"prompts": ["fox jumped over the moon"],
"width": 1024,
"height": 1024,
"num_images": 1
}
}'
|
Text & Image Embedding-as-a-Service (CLIP-as-a-Service)
gRPC API โก | REST API |
from nos.client import Client
client = Client("[::]:50051")
clip = client.Module("openai/clip")
txt_vec = clip.encode_text(text=["fox jumped over the moon"])
|
curl \
-X POST http://localhost:8000/infer \
-H 'Content-Type: application/json' \
-d '{
"model_id": "openai/clip",
"method": "encode_text",
"inputs": {
"texts": ["fox jumped over the moon"]
}
}'
|
๐ Directory Structure
โโโ docker # Dockerfile for CPU/GPU servers
โโโ docs # mkdocs documentation
โโโ examples # example guides, jupyter notebooks, demos
โโโ makefiles # makefiles for building/testing
โโโ nos
โย ย โโโ cli # CLI (hub, system)
โย ย โโโ client # gRPC / REST client
โย ย โโโ common # common utilities
โย ย โโโ executors # runtime executor (i.e. Ray)
โย ย โโโ hub # hub utilies
โย ย โโโ managers # model manager / multiplexer
โย ย โโโ models # model zoo
โย ย โโโ proto # protobuf defs for NOS gRPC service
โย ย โโโ server # server backend (gRPC)
โย ย โโโ test # pytest utilities
โโโ requirements # requirement extras (server, docs, tests)
โโโ scripts # basic scripts
โโโ tests # pytests (client, server, benchmark)
๐ Documentation
- Quickstart
- Models
- Concepts: NOS Architecture
- Demos: Building a Discord Image Generation Bot, Video Search Demo
๐ฃ Roadmap
HW / Cloud Support
-
Commodity GPUs
- NVIDIA GPUs (20XX, 30XX, 40XX)
- AMD GPUs (RX 7000)
-
Cloud GPUs
- NVIDIA (H100, A100, A10G, A30G, T4, L4)
- AMD (MI200, MI250)
-
Cloud Service Providers (via SkyPilot)
- AWS, GCP, Azure
- Opinionated Cloud: Lambda Labs, RunPod, etc
-
Cloud ASICs
- AWS Inferentia (Inf1/Inf2)
- Google TPU
- Coming soon! (Habana Gaudi, Tenstorrent)
๐ License
This project is licensed under the Apache-2.0 License.
๐ก Telemetry
NOS collects anonymous usage data using Sentry. This is used to help us understand how the community is using NOS and to help us prioritize features. You can opt-out of telemetry by setting NOS_TELEMETRY_ENABLED=0
.
๐ค Contributing
We welcome contributions! Please see our contributing guide for more information.
๐ Quick Links
- ๐ฌ Send us an email at support@autonomi.ai or join our Discord for help.
- ๐ฃ Follow us on Twitter, and LinkedIn to keep up-to-date on our products.
<style> .md-typeset h1, .md-content__button { display: none; } </style>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file torch_nos-0.1.0rc2-py3-none-any.whl
.
File metadata
- Download URL: torch_nos-0.1.0rc2-py3-none-any.whl
- Upload date:
- Size: 1.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a3f845c0addaf1c6780e556ad2cc738c9b63e060db23754d70cf11bc5434dfcc |
|
MD5 | 3a75d76cc5899a330848f265b631aba7 |
|
BLAKE2b-256 | bada941a292495c106bb1cb57b71449127a421cef28968f37884001f0fd8f629 |