Skip to main content

Nexa AI SDK

Project description

Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML models. It supports text generation, image generation, vision-language models (VLM), and speech-to-text (ASR), and text-to-speech (TTS) capabilities. Additionally, it offers an OpenAI-compatible API server with JSON schema mode for function calling and streaming support, and a user-friendly Streamlit UI. Users can run Nexa SDK in any device with Python environment, and GPU acceleration is supported, including CUDA, Metal, and ROCm. An executable version is also available.

Latest News 🔥

  • [2024/10] Support embedding model: nexa embed <model_path> <prompt>
  • [2024/10] Support pull and run supported Computer Vision models in GGUF format from HuggingFace: nexa run -hf <model_id> -mt COMPUTER_VISION
  • [2024/10] Support VLM in local server.
  • [2024/10] Added option to customize maximum context window for NLP and VLM models.
  • [2024/10] Support running model from user's local path
  • [2024/10] Added LoRA support for NLP models.
  • [2024/10] Added support for whisper-large-v3-turbo: nexa run faster-whisper-large-turbo
  • [2024/10] Added support for AMD-Llama-135m: nexa run AMD-Llama-135m:fp16
  • [2024/09] Nexa now has executables for easy installation: Install Nexa SDK
  • [2024/09] Added support for Llama 3.2 models: nexa run llama3.2
  • [2024/09] Added support for Qwen2.5, Qwen2.5-coder and Qwen2.5-Math models: nexa run qwen2.5
  • [2024/09] Support pull and run NLP models in GGUF format from HuggingFace: nexa run -hf <model_id> -mt NLP
  • [2024/09] Added support for ROCm
  • [2024/09] Added support for Phi-3.5 models: nexa run phi3.5
  • [2024/09] Added support for OpenELM models: nexa run openelm
  • [2024/09] Introduced logits API support for more advanced model interactions
  • [2024/09] Added support for Flux models: nexa run flux
  • [2024/09] Added support for Stable Diffusion 3 model: nexa run sd3
  • [2024/09] Added support for Stable Diffusion 2.1 model: nexa run sd2-1

Welcome to submit your requests through issues, we ship weekly.

Installation - Executable

macOS

Download

Linux

curl -fsSL https://public-storage.nexa4ai.com/install.sh | sh 

Windows

Coming soon. Install with Python package below 👇

Installation - Python Package

We have released pre-built wheels for various Python versions, platforms, and backends for convenient installation on our index page.

[!NOTE]

  1. If you want to use ONNX model, just replace pip install nexaai with pip install "nexaai[onnx]" in provided commands.
  2. For Chinese developers, we recommend you to use Tsinghua Open Source Mirror as extra index url, just replace --extra-index-url https://pypi.org/simple with --extra-index-url https://pypi.tuna.tsinghua.edu.cn/simple in provided commands.

CPU

pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cpu --extra-index-url https://pypi.org/simple --no-cache-dir

GPU (Metal)

For the GPU version supporting Metal (macOS):

CMAKE_ARGS="-DGGML_METAL=ON -DSD_METAL=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/metal --extra-index-url https://pypi.org/simple --no-cache-dir
FAQ: cannot use Metal/GPU on M1

Try the following command:

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
bash Miniforge3-MacOSX-arm64.sh
conda create -n nexasdk python=3.10
conda activate nexasdk
CMAKE_ARGS="-DGGML_METAL=ON -DSD_METAL=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/metal --extra-index-url https://pypi.org/simple --no-cache-dir

GPU (CUDA)

For Linux:

CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir

For Windows PowerShell:

$env:CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON"; pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir

For Windows Command Prompt:

set CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" & pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir

For Windows Git Bash:

CMAKE_ARGS="-DGGML_CUDA=ON -DSD_CUBLAS=ON" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir
FAQ: Building Issues for llava

If you encounter the following issue while building:

try the following command:

CMAKE_ARGS="-DCMAKE_CXX_FLAGS=-fopenmp" pip install nexaai

GPU (ROCm)

For Linux:

CMAKE_ARGS="-DGGML_HIPBLAS=on" pip install nexaai --prefer-binary --index-url https://nexaai.github.io/nexa-sdk/whl/rocm621 --extra-index-url https://pypi.org/simple --no-cache-dir

Local Build

How to clone this repo

git clone --recursive https://github.com/NexaAI/nexa-sdk

If you forget to use --recursive, you can use below command to add submodule

git submodule update --init --recursive

Then you can build and install the package

pip install -e .

Features

  • Model Support:

    • ONNX & GGML models
    • Conversion Engine
    • Inference Engine:
      • Text Generation
      • Image Generation
      • Vision-Language Models (VLM)
      • Speech-to-Text (ASR)

Detailed API documentation is available here.

  • Server:
    • OpenAI-compatible API
    • JSON schema mode for function calling
    • Streaming support
  • Streamlit UI for interactive model deployment and testing

Below is our differentiation from other similar tools:

Feature Nexa SDK ollama Optimum LM Studio
GGML Support
ONNX Support
Text Generation
Image Generation
Vision-Language Models
Text-to-Speech
Server Capability
User Interface

Supported Models & Model Hub

Our on-device model hub offers all types of quantized models (text, image, audio, multimodal) with filters for RAM, file size, Tasks, etc. to help you easily explore models with UI. Explore on-device models at On-device Model Hub

Supported models (full list at Model Hub):

Model Type Format Command
octopus-v2 NLP GGUF nexa run octopus-v2
octopus-v4 NLP GGUF nexa run octopus-v4
gpt2 NLP GGUF nexa run gpt2
tinyllama NLP GGUF nexa run tinyllama
llama2 NLP GGUF/ONNX nexa run llama2
llama2-uncensored NLP GGUF nexa run llama2-uncensored
llama2-function-calling NLP GGUF nexa run llama2-function-calling
llama3 NLP GGUF/ONNX nexa run llama3
llama3.1 NLP GGUF/ONNX nexa run llama3.1
llama3.2 NLP GGUF nexa run llama3.2
llama3-uncensored NLP GGUF nexa run llama3-uncensored
gemma NLP GGUF/ONNX nexa run gemma
gemma2 NLP GGUF nexa run gemma2
qwen1.5 NLP GGUF nexa run qwen1.5
qwen2 NLP GGUF/ONNX nexa run qwen2
qwen2.5 NLP GGUF nexa run qwen2.5
mathqwen NLP GGUF nexa run mathqwen
codeqwen NLP GGUF nexa run codeqwen
mistral NLP GGUF/ONNX nexa run mistral
dolphin-mistral NLP GGUF nexa run dolphin-mistral
codegemma NLP GGUF nexa run codegemma
codellama NLP GGUF nexa run codellama
deepseek-coder NLP GGUF nexa run deepseek-coder
phi2 NLP GGUF nexa run phi2
phi3 NLP GGUF/ONNX nexa run phi3
phi3.5 NLP GGUF nexa run phi3.5
openelm NLP GGUF nexa run openelm
AMD-Llama-135m NLP GGUF nexa run AMD-Llama-135m:fp16
nanollava Multimodal GGUF nexa run nanollava
llava-phi3 Multimodal GGUF nexa run llava-phi3
llava-llama3 Multimodal GGUF nexa run llava-llama3
llava1.6-mistral Multimodal GGUF nexa run llava1.6-mistral
llava1.6-vicuna Multimodal GGUF nexa run llava1.6-vicuna
stable-diffusion-v1-4 Computer Vision GGUF nexa run sd1-4
stable-diffusion-v1-5 Computer Vision GGUF/ONNX nexa run sd1-5
stable-diffusion-v2-1 Computer Vision GGUF nexa run sd2-1
stable-diffusion-3-medium Computer Vision GGUF nexa run sd3
FLUX.1-schnell Computer Vision GGUF nexa run flux
lcm-dreamshaper Computer Vision GGUF/ONNX nexa run lcm-dreamshaper
hassaku-lcm Computer Vision GGUF nexa run hassaku-lcm
anything-lcm Computer Vision GGUF nexa run anything-lcm
faster-whisper-tiny Audio BIN nexa run faster-whisper-tiny
faster-whisper-small Audio BIN nexa run faster-whisper-small
faster-whisper-medium Audio BIN nexa run faster-whisper-medium
faster-whisper-base Audio BIN nexa run faster-whisper-base
faster-whisper-large Audio BIN nexa run faster-whisper-large
whisper-large-v3-turbo Audio BIN nexa run faster-whisper-large-turbo
whisper-tiny.en Audio ONNX nexa run whisper-tiny.en
whisper-tiny Audio ONNX nexa run whisper-tiny
whisper-small.en Audio ONNX nexa run whisper-small.en
whisper-small Audio ONNX nexa run whisper-small
whisper-base.en Audio ONNX nexa run whisper-base.en
whisper-base Audio ONNX nexa run whisper-base
mxbai-embed-large-v1 Embedding GGUF nexa embed mxbai
nomic-embed-text-v1.5 Embedding GGUF nexa embed nomic
all-MiniLM-L6-v2 Embedding GGUF nexa embed all-MiniLM-L6-v2:fp16
all-MiniLM-L12-v2 Embedding GGUF nexa embed all-MiniLM-L12-v2:fp16

CLI Reference

Here's a brief overview of the main CLI commands:

  • nexa run: Run inference for various tasks using GGUF models.
  • nexa onnx: Run inference for various tasks using ONNX models.
  • nexa server: Run the Nexa AI Text Generation Service.
  • nexa eval: Run the Nexa AI Evaluation Tasks.
  • nexa pull: Pull a model from official or hub.
  • nexa remove: Remove a model from local machine.
  • nexa clean: Clean up all model files.
  • nexa list: List all models in the local machine.
  • nexa login: Login to Nexa API.
  • nexa whoami: Show current user information.
  • nexa logout: Logout from Nexa API.

For detailed information on CLI commands and usage, please refer to the CLI Reference document.

Start Local Server

To start a local server using models on your local computer, you can use the nexa server command. For detailed information on server setup, API endpoints, and usage examples, please refer to the Server Reference document.

Acknowledgements

We would like to thank the following projects:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nexaai-0.0.8.8.tar.gz (47.1 MB view details)

Uploaded Source

File details

Details for the file nexaai-0.0.8.8.tar.gz.

File metadata

  • Download URL: nexaai-0.0.8.8.tar.gz
  • Upload date:
  • Size: 47.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.15

File hashes

Hashes for nexaai-0.0.8.8.tar.gz
Algorithm Hash digest
SHA256 0977b107a218a163290c2ad0ea6b4d8f6cad0f4b071d96660c43d50ee4eaa69c
MD5 bc41ee4660534c318a7a84916af7a1af
BLAKE2b-256 7ab190c7b2d88d7bd85269b3f0f2586689ad574c50bf74c78f93ab21dcbd5477

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page