Client of Friendli Suite.

These details have not been verified by PyPI

Project links

Project description

Friendli Logo

Supercharge Generative AI Serving with Friendli 🚀

The Friendli Client offers convenient interface to interact with endpoint services provided by Friendli Suite, the ultimate solution for serving generative AI models. Designed for flexibility and performance, it supports both synchronous and asynchronous operations, making it easy to integrate powerful AI capabilities into your applications.

Installation

To get started with Friendli, install the client package using pip:

pip install friendli-client

[!IMPORTANT] You must set FRIENDLI_TOKEN environment variable before initializing the client instance with client = Friendli(). Alternatively, you can provide the value of your personal access token as the token argument when creating the client, like so:
from friendli import Friendli

client = Friendli(token="YOUR PERSONAL ACCESS TOKEN")

Friendli Serverless Endpoints

Friendli Serverless Endpoint offer a simple, click-and-play interface for accessing popular open-source models like Llama 3.1. With pay-per-token billing, this is ideal for exploration and experimentation.

To interact with models hosted by serverless endpoints, provide the model code you want to use in the model argument. Refer to the pricing table for a list of available model codes and their pricing.

from friendli import Friendli

client = Friendli()

chat_completion = client.chat.completions.create(
    model="meta-llama-3.1-8b-instruct",
    messages=[
        {
            "role": "user",
            "content": "Tell me how to make a delicious pancake",
        }
    ],
)
print(chat_completion.choices[0].message.content)

Friendli Dedicated Endpoints

Friendli Dedicated Endpoints enable you to run your custom generative AI models on dedicated GPU resources.

To interact with dedicated endpoints, provide the endpoint ID in the model argument.

import os
from friendli import Friendli

client = Friendli(
    team_id=os.environ["TEAM_ID"],  # If not provided, default team is used.
    use_dedicated_endpoint=True,
)

chat_completion = client.chat.completions.create(
    model=os.environ["ENDPOINT_ID"],
    messages=[
        {
            "role": "user",
            "content": "Tell me how to make a delicious pancake",
        }
    ],
)
print(chat_completion.choices[0].message.content)

Friendli Container

Friendli Container is perfect for users who prefer to serve LLMs within their own infrastructure. By deploying the Friendli Engine in containers on your on-premise or cloud GPUs, you can maintain complete control over your data and operations, ensuring security and compliance with internal policies.

from friendli import Friendli

client = Friendli(base_url="http://0.0.0.0:8000")

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Tell me how to make a delicious pancake",
        }
    ],
)
print(chat_completion.choices[0].message.content)

Async Usage

import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli()

async def main() -> None:
    chat_completion = await client.chat.completions.create(
        model="meta-llama-3.1-8b-instruct",
        messages=[
            {
                "role": "user",
                "content": "Tell me how to make a delicious pancake",
            }
        ],
    )
    print(chat_completion.choices[0].message.content)


asyncio.run(main())

Streaming Usage

from friendli import Friendli

client = Friendli()

stream = client.chat.completions.create(
    model="meta-llama-3.1-8b-instruct",
    messages=[
        {
            "role": "user",
            "content": "Tell me how to make a delicious pancake",
        }
    ],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

The async client (AsyncFriendli) uses the same interface to stream the response.

import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli()

async def main() -> None:
    stream = await client.chat.completions.create(
        model="meta-llama-3.1-8b-instruct",
        messages=[
            {
                "role": "user",
                "content": "Tell me how to make a delicious pancake",
            }
        ],
        stream=True,
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)


asyncio.run(main())

Advanced Usage

Sending Requests to LoRA Adapters

If your endpoint is serving a Multi-LoRA model, you can send request to one of the adapters by providing the adapter route in the model argument.

For Friendli Dedicated Endpoints, provide the endpoint ID and the adapter route separated by a colon (:).

import os
from friendli import Friendli

client = Friendli(
    team_id=os.environ["TEAM_ID"],  # If not provided, default team is used.
    use_dedicated_endpoint=True,
)

chat_completion = client.lora.completions.create(
    model=f"{os.environ['ENDPOINT_ID']}:{os.environ['ADAPTER_ROUTE']}",
    messages=[
        {
            "role": "user",
            "content": "Tell me how to make a delicious pancake",
        }
    ],
)

For Friendli Container, just provide the adapter name.

import os
from friendli import Friendli

client = Friendli(base_url="http://0.0.0.0:8000")

chat_completion = client.lora.completions.create(
    model=os.environ["ADAPTER_NAME"],
    messages=[
        {
            "role": "user",
            "content": "Tell me how to make a delicious pancake",
        }
    ],
)

Using the gRPC Interface

[!IMPORTANT] gRPC is only supported by Friendli Container, and only the streaming API of v1/completions is available.

When Frienldi Container is running in gPRC mode, the client can interact with the gRPC server by initializing it with use_grpc=True argument.

from friendli import Friendli

client = Friendli(base_url="0.0.0.0:8000", use_grpc=True)

stream = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Tell me how to make a delicious pancake",
        }
    ],
    stream=True,  # Only streaming mode is available
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="", flush=True)

Configuring the HTTP Client

The client uses httpx to send HTTP requests. You can provide the customized httpx.Client when initializing Friendli.

import httpx
from friendli import Friendli

with httpx.Client() as client:
    client = Friendli(http_client=http_client)

For the async client, you can provide httpx.AsyncClient.

import httx
from friendli import AsyncFriendli

with httpx.AsyncClient() as client:
    client = AsyncFriendli(http_client=http_client)

Configuring the gRPC Channel

import grpc
from friendli import Friendli

with grpc.insecure_channel("0.0.0.0:8000") as channel:
    client = Friendli(use_grpc=True, grpc_channel=channel)

You can use the same interface for the async client.

import grpc.aio
from friendli import AsyncFriendli

async with grpc.aio.insecure_channel("0.0.0.0:8000") as channel:
    client = AsyncFriendli(use_grpc=True, grpc_channel=channel)

Managing Resource

The Friendli client provides several methods to manage and release resources.

Closing the Client

Both the Friendli and AsyncFriendli clients can hold network connections or other resources during their lifetime. To ensure these resources are properly released, you should either call the close() method or use the client within a context manager.

from friendli import Friendli

client = Friendli()

# Use the client for various operations...

# When done, close the client to release resources
client.close()

For the asynchronous client, the pattern is similar:

import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli()

# Use the client for various async operations...

# When done, close the client to release resources
await client.close()

You can also use context manager to automatically close the client and releases resources when the block is exited, making it a safer and more convenient way to manage resources.

from friendli import Friendli

with Friendli() as client:
    ...

For asynchronous usage:

import asyncio
from friendli import AsyncFriendli

async def main():
    async with AsyncFriendli() as client:
        ...


asyncio.run(main())

Managing Streaming Responses

When using streaming responses, it’s crucial to properly close the HTTP connection after the interaction is complete. By default, the connection is automatically closed once all data from the stream has been consumed (i.e., when the for-loop reaches the end). However, if streaming is interrupted by exceptions or other issues, the connection may remain open and won’t be released until it is garbage-collected. To ensure that all underlying connections and resources are properly released, it’s important to explicitly close the connection, particularly when streaming is prematurely terminated.

from friendli import Friendli

client = Friendli()

stream = client.chat.completions.create(
    model="meta-llama-3.1-8b-instruct",
    messages=[
        {
            "role": "user",
            "content": "Tell me how to make a delicious pancake",
        }
    ],
    stream=True,
)

try:
    for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)
finally:
    stream.close()  # Ensure the stream is closed after use

For asynchronous streaming:

import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli()

async def main():
    stream = await client.chat.completions.create(
        model="meta-llama-3.1-8b-instruct",
        messages=[
            {
                "role": "user",
                "content": "Tell me how to make a delicious pancake",
            }
        ],
        stream=True,
    )

    try:
        async for chunk in stream:
            print(chunk.choices[0].delta.content or "", end="", flush=True)
    finally:
        await stream.close()  # Ensure the stream is closed after use

asyncio.run(main())

You can also use context manager to automatically close the client and releases resources when the block is exited, making it a safer and more convenient way to manage resources.

from friendli import Friendli

client = Friendli()

with client.chat.completions.create(
    model="meta-llama-3.1-8b-instruct",
    messages=[
        {
            "role": "user",
            "content": "Tell me how to make a delicious pancake",
        }
    ],
    stream=True,
) as stream:
    for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

For asynchronous streaming:

import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli()

async def main():
    async with await client.chat.completions.create(
        model="meta-llama-3.1-8b-instruct",
        messages=[
            {
                "role": "user",
                "content": "Tell me how to make a delicious pancake",
            }
        ],
        stream=True,
    ) as stream:
        async for chunk in stream:
            print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(main())

Canceling a gRPC Stream

When using the gRPC interface with streaming, you might want to cancel an ongoing stream operation before it completes. This is particularly useful if you need to stop the stream due to a timeout or some other condition.

For synchronous gRPC streaming:

from friendli import Friendli

client = Friendli(base_url="0.0.0.0:8000", use_grpc=True)

stream = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Tell me how to make a delicious pancake",
        }
    ],
    stream=True,
)

try:
    for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="", flush=True)
except SomeException:
    stream.cancel()  # Cancel the stream in case of an error or interruption

For asynchronous gRPC streaming:

import asyncio
from friendli import AsyncFriendli

client = AsyncFriendli(base_url="0.0.0.0:8000", use_grpc=True)

async def main():
    stream = await client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": "Tell me how to make a delicious pancake",
            }
        ],
        stream=True,
    )

    try:
        async for chunk in stream:
            print(chunk.choices[0].delta.content or "", end="", flush=True)
    except SomeException:
        stream.cancel()  # Cancel the stream in case of an error or interruption

asyncio.run(main())

CLI Examples

You can also call the generation APIs directly with CLI.

friendli api chat-completions create \
  -g "user Tell me how to make a delicious pancake" \
  -m meta-llama-3.1-8b-instruct

For further information about the friendli command, run friendli --help in your terminal shell. This will provide you with a detailed list of available options and usage instructions.

[!TIP] > Check out our official documentation to learn more!

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.0a16 pre-release

Jul 4, 2025

2.0.0a15 pre-release

Jul 4, 2025

2.0.0a14 pre-release

Jul 4, 2025

2.0.0a13 pre-release

Jul 4, 2025

2.0.0a12 pre-release

Jul 4, 2025

2.0.0a11 pre-release

Aug 7, 2024

2.0.0a10 pre-release

Aug 7, 2024

2.0.0a9 pre-release

Aug 7, 2024

2.0.0a8 pre-release

Jul 23, 2024

2.0.0a7 pre-release

Jul 10, 2024

2.0.0a6 pre-release

Jul 10, 2024

2.0.0a5 pre-release

Jul 10, 2024

2.0.0a4 pre-release

Jul 10, 2024

2.0.0a3 pre-release

Jul 10, 2024

2.0.0a2 pre-release

Jul 10, 2024

2.0.0a1 pre-release

Jul 10, 2024

2.0.0a0 pre-release

Jul 9, 2024

This version

1.5.8

Jan 24, 2025

1.5.7

Jan 24, 2025

1.5.6

Oct 17, 2024

1.5.5

Oct 15, 2024

1.5.4

Aug 30, 2024

1.5.3

Aug 23, 2024

1.5.2

Aug 14, 2024

1.5.1

Aug 14, 2024

1.5.0

Aug 6, 2024

1.4.2

Jul 21, 2024

1.4.1

Jun 19, 2024

1.4.0

Jun 18, 2024

1.3.7

Jun 12, 2024

1.3.6

Jun 10, 2024

1.3.5

May 28, 2024

1.3.4

Apr 2, 2024

1.3.3

Apr 1, 2024

1.3.2

Mar 26, 2024

1.3.1

Mar 25, 2024

1.3.0

Mar 23, 2024

1.2.4

Feb 19, 2024

1.2.3

Feb 16, 2024

1.2.2

Feb 16, 2024

1.2.1

Feb 14, 2024

1.2.0

Jan 30, 2024

1.1.0

Jan 4, 2024

1.0.1

Dec 28, 2023

1.0.0

Dec 27, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

friendli_client-1.5.8.tar.gz (68.2 kB view details)

Uploaded Jan 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

friendli_client-1.5.8-py3-none-any.whl (98.4 kB view details)

Uploaded Jan 24, 2025 Python 3

File details

Details for the file friendli_client-1.5.8.tar.gz.

File metadata

Download URL: friendli_client-1.5.8.tar.gz
Upload date: Jan 24, 2025
Size: 68.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.4 CPython/3.12.4 Darwin/24.1.0

File hashes

Hashes for friendli_client-1.5.8.tar.gz
Algorithm	Hash digest
SHA256	`9181506e81e7336bff66ffb6710201bc3acf88f741c970045b632bdb2853b5fa`
MD5	`b16078350205e76be9e16fcb048a69ed`
BLAKE2b-256	`b2946939422c23dc7f4b7710200e30d2c30a4e558594e84e8269f8ba1ebdfa16`

See more details on using hashes here.

File details

Details for the file friendli_client-1.5.8-py3-none-any.whl.

File metadata

Download URL: friendli_client-1.5.8-py3-none-any.whl
Upload date: Jan 24, 2025
Size: 98.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.4 CPython/3.12.4 Darwin/24.1.0

File hashes

Hashes for friendli_client-1.5.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`84addaf4967b6417fcf51ebea285ea009b79f36a1fd91576d621f579e4ce535b`
MD5	`d6a4851ad5e5aea332a4473c582320da`
BLAKE2b-256	`d8695555db7506ebbb435700a2762de3d110a772f4cf305a19b19123e7b0b82e`

See more details on using hashes here.

friendli-client 1.5.8

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Supercharge Generative AI Serving with Friendli 🚀

Installation

Friendli Serverless Endpoints

Friendli Dedicated Endpoints

Friendli Container

Async Usage

Streaming Usage

Advanced Usage

Sending Requests to LoRA Adapters

Using the gRPC Interface

Configuring the HTTP Client

Configuring the gRPC Channel

Managing Resource

Closing the Client

Managing Streaming Responses

Canceling a gRPC Stream

CLI Examples

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes