Nitrous Oxide for your AI Infrastructure.
Project description
Website | Docs | Tutorials | Playground | Blog | Discord
NOS is a fast and flexible PyTorch inference server that runs on any cloud or AI HW.
🛠️ Key Features
- 👩💻 Easy-to-use: Built for PyTorch and designed to optimize, serve and auto-scale Pytorch models in production without compromising on developer experience.
- 🥷 Multi-modal & Multi-model: Serve multiple foundational AI models (LLMs, Diffusion, Embeddings, Speech-to-Text and Object Detection) simultaneously, in a single server.
- ⚙️ HW-aware Runtime: Deploy PyTorch models effortlessly on modern AI accelerators (NVIDIA GPUs, AWS Inferentia2, AMD - coming soon, and even CPUs).
- ☁️ Cloud-agnostic Containers: Run on any cloud (AWS, GCP, Azure, Lambda Labs, On-Prem) with our ready-to-use inference server containers.
🔥 What's New
- [Feb 2024] ✍️ [blog] Introducing the NOS Inferentia2 (
inf2
) runtime. - [Jan 2024] ✍️ [blog] Serving LLMs on a budget with SkyServe.
- [Jan 2024] 📚 [docs] NOS x SkyPilot Integration page!
- [Jan 2024] ✍️ [blog] Getting started with NOS tutorials is available here!
- [Dec 2023] 🛝 [repo] We open-sourced the NOS playground to help you get started with more examples built on NOS!
🚀 Quickstart
We highly recommend that you go to our quickstart guide to get started. To install the NOS client, you can run the following command:
conda create -n nos python=3.8 -y
conda activate nos
pip install torch-nos
Once the client is installed, you can start the NOS server via the NOS serve
CLI. This will automatically detect your local environment, download the docker runtime image and spin up the NOS server:
nos serve up --http --logging-level INFO
You are now ready to run your first inference request with NOS! You can run any of the following commands to try things out. You can set the logging level to DEBUG
if you want more detailed information from the server.
👩💻 What can NOS do?
💬 Chat / LLM Agents (ChatGPT-as-a-Service)
NOS provides an OpenAI-compatible server with streaming support so that you can connect your favorite OpenAI-compatible LLM client to talk to NOS.
API / Usage
gRPC API ⚡
from nos.client import Client
client = Client()
model = client.Module("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
response = model.chat(message="Tell me a story of 1000 words with emojis", _stream=True)
REST API
curl \
-X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "TinyLlama/TinyLlama-1.1B-Chat-v1.0",
"messages": [{
"role": "user",
"content": "Tell me a story of 1000 words with emojis"
}],
"temperature": 0.7,
"stream": true
}'
🏞️ Image Generation (Stable-Diffusion-as-a-Service)
Build MidJourney discord bots in seconds.
API / Usage
gRPC API ⚡
from nos.client import Client
client = Client()
sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")
image, = sdxl(prompts=["hippo with glasses in a library, cartoon styling"],
width=1024, height=1024, num_images=1)
REST API
curl \
-X POST http://localhost:8000/v1/infer \
-H 'Content-Type: application/json' \
-d '{
"model_id": "stabilityai/stable-diffusion-xl-base-1-0",
"inputs": {
"prompts": ["hippo with glasses in a library, cartoon styling"],
"width": 1024, "height": 1024,
"num_images": 1
}
}'
🧠 Text & Image Embedding (CLIP-as-a-Service)
Build scalable semantic search of images/videos in minutes.
API / Usage
gRPC API ⚡
from nos.client import Client
client = Client()
clip = client.Module("openai/clip-vit-base-patch32")
txt_vec = clip.encode_text(texts=["fox jumped over the moon"])
REST API
curl \
-X POST http://localhost:8000/v1/infer \
-H 'Content-Type: application/json' \
-d '{
"model_id": "openai/clip-vit-base-patch32",
"method": "encode_text",
"inputs": {
"texts": ["fox jumped over the moon"]
}
}'
🎙️ Audio Transcription (Whisper-as-a-Service)
Perform real-time audio transcription using Whisper.
API / Usage
gRPC API ⚡
from pathlib import Path
from nos.client import Client
client = Client()
model = client.Module("openai/whisper-small.en")
with client.UploadFile(Path("audio.wav")) as remote_path:
response = model(path=remote_path)
# {"chunks": ...}
REST API
curl \
-X POST http://localhost:8000/v1/infer/file \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'model_id=openai/whisper-small.en' \
-F 'file=@audio.wav'
🧐 Object Detection (YOLOX-as-a-Service)
Run classical computer-vision tasks in 2 lines of code.
API / Usage
gRPC API ⚡
from pathlib import Path
from nos.client import Client
client = Client()
model = client.Module("yolox/medium")
response = model(images=[Image.open("image.jpg")])
REST API
curl \
-X POST http://localhost:8000/v1/infer/file \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'model_id=yolox/medium' \
-F 'file=@image.jpg'
⚒️ Custom models
Want to run models not supported by NOS? You can easily add your own models following the examples in the NOS Playground.
📄 License
This project is licensed under the Apache-2.0 License.
📡 Telemetry
NOS collects anonymous usage data using Sentry. This is used to help us understand how the community is using NOS and to help us prioritize features. You can opt-out of telemetry by setting NOS_TELEMETRY_ENABLED=0
.
🤝 Contributing
We welcome contributions! Please see our contributing guide for more information.
🔗 Quick Links
- 💬 Send us an email at support@autonomi.ai or join our Discord for help.
- 📣 Follow us on Twitter, and LinkedIn to keep up-to-date on our products.
<style> .md-typeset h1, .md-content__button { display: none; } </style>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file torch_nos-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: torch_nos-0.3.0-py3-none-any.whl
- Upload date:
- Size: 1.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 56dc69e768c3e2b2281c10922df73d28e2c06ad6e94bef91387ee4bf8c7953ff |
|
MD5 | 6d8e21bc445c46868a887cbf338c3a2d |
|
BLAKE2b-256 | a9f91678bf550d32def0e200c534bd45f04e4b1e933a0efb1cbeb025a168bd54 |