SDK for nvidia services
Project description
NVIDIA SERVICES
NVIDIA recently announced NVIDIA NIMs which offers optimized inference microservices for deploying AI models at scale. The NIM services along with the NEMO services will allow to develop and deploy RAG based applications quickly in production.
This package is created to be the PYTHON SDK for those services. The idea is to write only few lines of code to develop applications with NVIDIA services. The bolier plate code goes in the SDK
How to use the SDK
The SDK is now pushed to PYPI. To install it run the below command
pip install nvidia-services
There are two services which are now part of the SDK
- EMBEDDING
- RERANKING
Example code for embedding
import os
from dotenv import load_dotenv
from nvidia_services.embeddings.nvidia_embeddings import NVIDIAEmbeddings
load_dotenv()
NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY")
nv = NVIDIAEmbeddings(api_key=NVIDIA_API_KEY)
embeddings = nv.create_embed(["hello, how are you","I am fine"])
for embedding in embeddings:
print(embedding)
Example code for reranker
import os
from dotenv import load_dotenv
from nvidia_services.retrievals.nvidia_reranker import NVDIARerankerMistral
load_dotenv()
NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY")
nv = NVDIARerankerMistral(api_key=NVIDIA_API_KEY)
passages = [
{
"text": "The Hopper GPU is paired with the Grace CPU using NVIDIA's ultra-fast chip-to-chip interconnect, delivering 900GB/s of bandwidth, 7X faster than PCIe Gen5. This innovative design will deliver up to 30X higher aggregate system memory bandwidth to the GPU compared to today's fastest servers and up to 10X higher performance for applications running terabytes of data."
},
{
"text": "A100 provides up to 20X higher performance over the prior generation and can be partitioned into seven GPU instances to dynamically adjust to shifting demands. The A100 80GB debuts the world's fastest memory bandwidth at over 2 terabytes per second (TB/s) to run the largest models and datasets."
},
{
"text": "Accelerated servers with H100 deliver the compute power—along with 3 terabytes per second (TB/s) of memory bandwidth per GPU and scalability with NVLink and NVSwitch™."
}
]
input = "What is the GPU memory bandwidth of H100 SXM?"
result = nv.return_context(input=input,passages=passages)
print(result)
Example code to call the mistral model
import os
from dotenv import load_dotenv
from nvidia_services.models.mistralai_models import MistralAIModels
load_dotenv()
NVIDIA_API_KEY = os.getenv("NVIDIA_API_KEY")
mistral = MistralAIModels(api_key=NVIDIA_API_KEY)
prompt = "Where is TajMahal?"
result = mistral.generate_response(prompt=prompt)
for chunk in result:
print(chunk.choices[0].delta.content)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for nvidia_services-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0571c448f3e8cb2337d850869dc8059b6702819c13804a7409483953e2d81fd8 |
|
MD5 | b74ca1bcced40785f7f0b40f889fc022 |
|
BLAKE2b-256 | b6c34930957e34feea269f48a44ad08c8867ab721cb7241d926b26cb814f78e0 |