Skip to main content

Seamlessly integrate with top LLM APIs for speedy, robust, and scalable querying. Ideal for developers needing quick, reliable AI-powered responses.

Project description

⚡FastInference - The Ultra-Fast LLM Querying Manager (OpenAi, HuggingFace, Ollama, ...)

Query any LLM API and get the responses very fast with a highly robust and distributed library.
All the LLMs providers can be used with FastInference [OpenAI, Huggingface, VertexAI, TogetherAI, Azure, etc.]

Features

  • High Performance: Get high inference speed thanks to intelligent asynchronous and distributed querying.
  • Robust Error Handling: Advanced mechanisms to handle exceptions, ensuring robust querying.
  • Ease of Use: Simplified API designed for working with all the LLM providers: easy and fast.
  • Scalability: Optimized for large datasets and high concurrency.

The workflow

Diagram of the workflow

Usage

pip install fastinference-llm
from fastinference import FastInference

prompt = """
            You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.
            
            Tweet: {tweet_content}
        """

api_key = "you-api-key"
model_name = "modelprovider/model_name"

results = FastInference(file_path="your-dataset-file-path", 
                        main_column="your-main-feature", 
                        prompt=prompt, 
                        api_key=api_key,
                        model_name=model_name, 
                        only_response=True).run()
print(results)

The Parameters

Here are the parameters that are not optional for initializing the FastInference object.

  • file_path (string): path to your dataset (csv, xlsx, json, parquet)
  • main_column (string): name of the main column (explained below in detail)
  • prompt (string): the prompt with the variable in it (explained below in detail)
  • api_key (string): your API key
  • model_name (string): has the format provider/model_name (for example "huggingface/meta-llama/Meta-Llama-3-70B")
  • only_response (bool): if True, you get a list containing the response of the LLM otherwise you get the full object normalized following the OpenAI API

The Prompt

One of the parameter of the FastInference library is a prompt.The prompt must be in a string format. It contains between curly brackets the column's name from your dataset where you want the variable to be in the prompt.

Example Usage

To understand how to use the prompt parameter in the FastInference library, we'll provide an example based on a tweet sentiment classification task. Consider a dataset with the following structure:

tweet_content related_entities
"Just had the best day ever at the NeurIPS Conference!" "NeurIPS"
"Traffic was terrible this morning in Paris." "Paris"
"Looking forward to the new Star Wars movie!" "Star Wars"

One of the parameters of the FastInference library is a prompt. This must be formatted as a string. It contains, within curly brackets, the names of the columns from your dataset that you want to include in the prompt.

Here's how you could set up your prompt for classifying the sentiment of tweets based on their content and related entities:

prompt = """
          You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.
          You must consider the related identified entities in order to make a good decision.
          
          Tweet: {tweet_content}
          Related Entities: {related_entities}
          """

The main_column Parameter

The parameter main_column is the parameter that is considered as the most important information for inference. It is a string containing the name of the most important column in your data. It does not influence the LLM in inference since the prompt does not create hierarchical relationships between data.

The main column has no influence on LLM inference.

Output format

If only_response is True, it gives back a list with items created by the library, and these items are strings.

Here is the structure of the return data if only_response=True:

["response 1", "response 2", ..., "response n"]

But if only_response is False, it gives back a list of Datablock items. Each Datablock item has these parts: content (str), metadata (dict), content_with_prompt (Prompt object), and response (ModelResponse, which is part of the OpenAI API). You can easily get the words generated by the language model by picking from the "choices" attribute.

Here is the structure of the return data if only_response=False:

[
        Datablock(content: str, content_with_prompt: PromptTemplate, metadata: dict, response: ModelResponse),
        ...
        Datablock(content: str, content_with_prompt: PromptTemplate, metadata: dict, response: ModelResponse)
]

The only_response=False is by default and advised. The Datablock item keeps track of data correctly after the distribution steps. It makes sure the data stays reliable and concistent throughout the process.

Supported Providers (Docs)

The FastInference is based on the open-source LiteLLM library. All the supported LLMs by LiteLLM are also by FastInference.

Provider Completion
openai
azure
aws - sagemaker
aws - bedrock
google - vertex_ai [Gemini]
google - palm
google AI Studio - gemini
mistral ai api
cloudflare AI Workers
cohere
anthropic
huggingface
replicate
together_ai
openrouter
ai21
baseten
vllm
nlp_cloud
aleph alpha
petals
ollama
deepinfra
perplexity-ai
Groq AI
anyscale
IBM - watsonx.ai
voyage ai
xinference [Xorbits Inference]

Contributing

To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.

Here's how to modify the repo locally: Clone the repo

git clone https://github.com/blefo/FastInference.git

Make your changes then Submit a PR! 🚀 push your fork to your GitHub repo and submit a PR from there

  • Add new method for data loading
  • Make the API KEY and model's information directly loaded in the os variables
  • Optimize the DataBlock Structure
  • Leverage the LiteLLM's feature for rotating APIs and keys in order to avoid the exceptions

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastinference_llm-0.0.5.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fastinference_llm-0.0.5-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file fastinference_llm-0.0.5.tar.gz.

File metadata

  • Download URL: fastinference_llm-0.0.5.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for fastinference_llm-0.0.5.tar.gz
Algorithm Hash digest
SHA256 baa67c2d74904d31576f702d029b567897ed50936c9419127d640562b989187d
MD5 97cb8df6b97d3bb80128dcd5d0472b3a
BLAKE2b-256 fe0a108004bf466e73884c5ab1f589407e3553e00b61935d72ad066c9cd3f808

See more details on using hashes here.

File details

Details for the file fastinference_llm-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for fastinference_llm-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 602527bfb8bc9636a98c5d0bbaf868b29653c65afc51a19b3f38b139e7addc82
MD5 f5f8a92d1edcd00a7ddcd09f9b0947f1
BLAKE2b-256 72829643630ad369992968bf933c9c617150d2aac6086be4bc1b7a61356b3889

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page