Seamlessly integrate with top LLM APIs for speedy, robust, and scalable querying. Ideal for developers needing quick, reliable AI-powered responses.

These details have not been verified by PyPI

Project links

Homepage

Project description

⚡FastInference - The Ultra-Fast LLM Querying Manager (OpenAi, HuggingFace, Ollama, ...)

Query any LLM API and get the responses very fast with a highly robust and distributed library.
All the LLMs providers can be used with FastInference [OpenAI, Huggingface, VertexAI, TogetherAI, Azure, etc.]

Features

High Performance: Get high inference speed thanks to intelligent asynchronous and distributed querying.
Robust Error Handling: Advanced mechanisms to handle exceptions, ensuring robust querying.
Ease of Use: Simplified API designed for working with all the LLM providers: easy and fast.
Scalability: Optimized for large datasets and high concurrency.

The workflow

Diagram of the workflow

Usage

pip install fastinference-llm

from fastinference import FastInference

prompt = """
            You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.
            
            Tweet: {tweet_content}
        """

api_key = "you-api-key"
model_name = "modelprovider/model_name"

results = FastInference(file_path="your-dataset-file-path", 
                        main_column="your-main-feature", 
                        prompt=prompt, 
                        api_key=api_key,
                        model_name=model_name, 
                        only_response=True).run()
print(results)

The Parameters

Here are the parameters that are not optional for initializing the FastInference object.

file_path (string): path to your dataset (csv, xlsx, json, parquet)
main_column (string): name of the main column (explained below in detail)
prompt (string): the prompt with the variable in it (explained below in detail)
api_key (string): your API key
model_name (string): has the format provider/model_name (for example "huggingface/meta-llama/Meta-Llama-3-70B")
only_response (bool): if True, you get a list containing the response of the LLM otherwise you get the full object normalized following the OpenAI API

The Prompt

One of the parameter of the FastInference library is a prompt.The prompt must be in a string format. It contains between curly brackets the column's name from your dataset where you want the variable to be in the prompt.

Example Usage

To understand how to use the prompt parameter in the FastInference library, we'll provide an example based on a tweet sentiment classification task. Consider a dataset with the following structure:

tweet_content	related_entities
"Just had the best day ever at the NeurIPS Conference!"	"NeurIPS"
"Traffic was terrible this morning in Paris."	"Paris"
"Looking forward to the new Star Wars movie!"	"Star Wars"

One of the parameters of the FastInference library is a prompt. This must be formatted as a string. It contains, within curly brackets, the names of the columns from your dataset that you want to include in the prompt.

Here's how you could set up your prompt for classifying the sentiment of tweets based on their content and related entities:

prompt = """
          You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.
          You must consider the related identified entities in order to make a good decision.
          
          Tweet: {tweet_content}
          Related Entities: {related_entities}
          """

The main_column Parameter

The parameter main_column is the parameter that is considered as the most important information for inference. It is a string containing the name of the most important column in your data. It does not influence the LLM in inference since the prompt does not create hierarchical relationships between data.

The main column has no influence on LLM inference.

Output format

If only_response is True, it gives back a list with items created by the library, and these items are strings.

Here is the structure of the return data if only_response=True:

["response 1", "response 2", ..., "response n"]

But if only_response is False, it gives back a list of Datablock items. Each Datablock item has these parts: content (str), metadata (dict), content_with_prompt (Prompt object), and response (ModelResponse, which is part of the OpenAI API). You can easily get the words generated by the language model by picking from the "choices" attribute.

Here is the structure of the return data if only_response=False:

[
        Datablock(content: str, content_with_prompt: PromptTemplate, metadata: dict, response: ModelResponse),
        ...
        Datablock(content: str, content_with_prompt: PromptTemplate, metadata: dict, response: ModelResponse)
]

The only_response=False is by default and advised. The Datablock item keeps track of data correctly after the distribution steps. It makes sure the data stays reliable and concistent throughout the process.

Supported Providers (Docs)

The FastInference is based on the open-source LiteLLM library. All the supported LLMs by LiteLLM are also by FastInference.

Provider	Completion
openai	✅
azure	✅
aws - sagemaker	✅
aws - bedrock	✅
google - vertex_ai [Gemini]	✅
google - palm	✅
google AI Studio - gemini	✅
mistral ai api	✅
cloudflare AI Workers	✅
cohere	✅
anthropic	✅
huggingface	✅
replicate	✅
together_ai	✅
openrouter	✅
ai21	✅
baseten	✅
vllm	✅
nlp_cloud	✅
aleph alpha	✅
petals	✅
ollama	✅
deepinfra	✅
perplexity-ai	✅
Groq AI	✅
anyscale	✅
IBM - watsonx.ai	✅
voyage ai
xinference [Xorbits Inference]

Contributing

To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.

Here's how to modify the repo locally: Clone the repo

git clone https://github.com/blefo/FastInference.git

Make your changes then Submit a PR! 🚀 push your fork to your GitHub repo and submit a PR from there

Add new method for data loading
Make the API KEY and model's information directly loaded in the os variables
Optimize the DataBlock Structure
Leverage the LiteLLM's feature for rotating APIs and keys in order to avoid the exceptions

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.5

May 16, 2024

0.0.4

May 16, 2024

0.0.2

May 15, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastinference_llm-0.0.5.tar.gz (11.7 kB view details)

Uploaded May 16, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fastinference_llm-0.0.5-py3-none-any.whl (11.3 kB view details)

Uploaded May 16, 2024 Python 3

File details

Details for the file fastinference_llm-0.0.5.tar.gz.

File metadata

Download URL: fastinference_llm-0.0.5.tar.gz
Upload date: May 16, 2024
Size: 11.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for fastinference_llm-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`baa67c2d74904d31576f702d029b567897ed50936c9419127d640562b989187d`
MD5	`97cb8df6b97d3bb80128dcd5d0472b3a`
BLAKE2b-256	`fe0a108004bf466e73884c5ab1f589407e3553e00b61935d72ad066c9cd3f808`

See more details on using hashes here.

File details

Details for the file fastinference_llm-0.0.5-py3-none-any.whl.

File metadata

Download URL: fastinference_llm-0.0.5-py3-none-any.whl
Upload date: May 16, 2024
Size: 11.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for fastinference_llm-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`602527bfb8bc9636a98c5d0bbaf868b29653c65afc51a19b3f38b139e7addc82`
MD5	`f5f8a92d1edcd00a7ddcd09f9b0947f1`
BLAKE2b-256	`72829643630ad369992968bf933c9c617150d2aac6086be4bc1b7a61356b3889`

See more details on using hashes here.

fastinference-llm 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

⚡FastInference - The Ultra-Fast LLM Querying Manager (OpenAi, HuggingFace, Ollama, ...)

Features

The workflow

Usage

The Parameters

The Prompt

Example Usage

The main_column Parameter

Output format

Supported Providers (Docs)

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes