Client of Friendli Suite.
Project description
Supercharge Generative AI Serving with Friendli 🚀
Friendli engine is the fastest engine for serving generative AI models such as GPT-3. With Friendli Suite, a company can significantly reduce the cost and environmental impact of running its generative AI models. Users can use Friendli engine in a container and run it on the infrastructure they manage. They can also use our Friendli dedicated endpoint service to reduce overheads of running generative AI models themselves.
Friendli Suite
High performance
Users can use Friendli to reduce serving costs and environmental consequences significantly. They can serve much higher traffic with the same number of GPUs—or serve the same amount of traffic with notably fewer GPUs. Friendli can serve 10x more throughput at the same level of latency.
Diverse model and options support
Friendli supports various language model architectures, embedding choices, and decoding options such as greedy decoding, top-k, top-p, and beam search. Friendli will support diffusion models as well in the near future, so stay tuned! Users can use Friendli in a container and run it by themselves, or they can use our cloud service. The cloud service supports the following features.
Effortless deployment
Friendli dedicated endpoints provides an easy serving experience with a Command Line Interface (CLI) and a web interface. With just a few clicks, users can deploy their models to the infrastructure that they desire. Users can move their serving between different clouds such as Azure, AWS, and GCP, and still have the same seamless experience.
Automatic load and fault management
Friendli dedicated endpoints monitor the resources in use and requests (responses) being sent to (sent from) the currently deployed model, allowing users a more stable model serving experience. When the number of requests sent to the deployed model increases, it automatically assigns more resources (GPU VMs) to the model, while it reduces resource usage when there are not as many requests. Furthermore, if a certain resource malfunctions, it proceeds with recovery based on the monitoring results.
🕹️ Friendli Client
Check out Friendli Client Docs to learn more.
Installation
pip install friendli-client
[!NOTE] If you have a Hugging Face checkpoint and want to convert it to a Friendli-compatible format with applying quantization, you need to install the package with the necessary machine learing library (
mllib
) dependencies. In this case, install the package with the following command:pip install "friendli-client[mllib]"
Examples
This example shows how to create a deployment and send a completion API request to the created deployment with Python SDK.
import os
from friendli import FriendliResource
client = FriendliResource(
api_key=os.environ["FRIENDLI_API_KEY"],
project=os.environ["FRIENDLI_PROJECT"],
)
# Create a deployment at GCP asia-northest3 region wtih one A100 GPU.
deployment = client.deployment.create(
checkpoint_id=os.environ["CHECKPOINT_ID"],
name="my-deployment",
cloud="gcp",
region="asia-northeast3",
gpu_type="a100",
num_gpus=1,
)
When the deployment becomes the "Healthy" status and ready to process inference requests, you can generate a completion with:
from friendli import Friendli
client = Friendli(
api_key=os.environ["FRIENDLI_API_KEY"],
project=os.environ["FRIENDLI_PROJECT"],
deployment_id=os.environ["DEPLOYMENT_ID"],
)
# Generate a completion by sending an inference request to the deployment created above.
completion = client.completions.create(
prompt="Python is a popular language for",
max_tokens=100,
top_p=0.8,
temperature=0.5,
no_repeat_ngram=3,
)
print(completion.choices[0].text)
"""
>>> Example Output:
web development. It is also used for a variety of other applications.
Python can be used to create desktop applications, web applications and mobile applications as well.
Python is one of the most popular languages for data science.
Data scientists use Python to analyze data.
The Python ecosystem is very diverse.
There are many libraries that can help you with your Python projects.
You can also find many Python tutorials online.
"""
You can also do the same with CLI.
# Switch CLI context to target project
friendli project switch my-project
# Create a deployment
friendli deployment create \
--checkpoint-id $YOUR_CHECKPOINT_ID \
--name my-deployment \
--cloud gcp \
--region asia-northeast3 \
--gpu-type a100 \
--num-gpus 1 \
--config-file config.yaml
When the deployment is ready, you can send a request with curl
.
# Send a inference request to the deployment.
curl -X POST https://gcp-asia-northeast3.friendli.ai/$DEPLOYMENT_ID/v1/completions \
-d '{"prompt": "Python is a popular language for", "max_tokens": 100, "top_p": 0.8, "temperature": 0.5, "no_repeat_ngram": 3}'
The response will be like:
{
"choices": [
{
"index": 0,
"seed": 18337142367832222086,
"text": " web development. It is also used for a variety of other applications.\nPython can be used to create desktop applications, web applications and mobile applications as well.\nPython is one of the most popular languages for data science.\nData scientists use Python to analyze data.\nThe Python ecosystem is very diverse.\nThere are many libraries that can help you with your Python projects.\nYou can also find many Python tutorials online.
"tokens": [3644,8300,290,3992,2478,13,198,37906,318,6768,973,284,...]
}
]
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for friendli_client-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d92f7ef39974047146a43effe82c73c25016f0a93c759734835385fcefb94d85 |
|
MD5 | 868322521ed0b76cd2baef41a596a0db |
|
BLAKE2b-256 | c882994ce4bea1bb2b3f3117a850e994cd46a32007d6e6b1bbae52d888c25d24 |