Skip to main content

A collection of commercial LLM inference providers and their available models

Project description

Inference providers

Disclaimer: I made this for my own use, so there are lots of gaps and omissions, which I hope can be filled over time.

Popular models have been given short canonical names, like codellama-70b or gpt-4.

inference-providers.json contains a list of known commercial inference providers, the models they offer, and their magic internal names.

Environment variables contain API keys for one or more of those providers.

These things go together to allow auto-selection of model sources at runtime. The user will be connected to whatever service(s) they happen to subscribe to, and which have the model requested.

Provider Key
Anyscale ANYSCALE_API_KEY
deepinfra DEEPINFRA_API_KEY
fireworks.ai FIREWORKS_API_KEY
Groq GROQ_API_KEY
Lepton AI LEPTON_API_KEY
Mistral AI MISTRAL_API_KEY
Octo AI OCTOAI_TOKEN
OpenAI OPENAI_API_KEY
OpenRouter AI OPENROUTER_API_KEY
Perplexity PERPLEXITYAI_API_KEY
together.ai TOGETHERAI_API_KEY

Supported using OpenAI compatibility proxy

  • Gemini via endpoint http://localhost:6006/v1

Google doesn't directly support the OpenAI API, but shims are available like this one. If you run a proxy server on port 6006, it will be automatically recognized as Gemini. The easiest way to do this is Docker:

docker run --restart=always -it -d -p 6006:8080 --name gemini zhu327/gemini-openai-proxy:latest

Almost but not quite supported

Replicate servers require non-standard header Authentication: Token <key> instead of Authentication: Bearer <key>, which breaks OpenAI compatibility. This is also something a local proxy could fix by rewriting the headers on outgoing requests.

I don't have an API key to test a direct connection yet, but claude-2 is available via OpenRouter if you have an OPENROUTER_API_KEY. The first-party connection will be used by default once it is available.

Limitations

The scope of this library is to find a working connection without requiring configuration, that's all.

No attempt is made to find the cheapest or fastest provider of a given model.

First party providers take priority over resellers (like OpenRouter), and connections are internally sorted very roughly by the speeds I saw while testing, but of course that changes constantly. Your results could be quite different.

Usage

from inference_providers import ProviderList
providers = ProviderList(auto_update=True)

client, name = providers.connect_to_model("mixtral-8x7b")
print(providers.get_response(client, name, "What's your sign?"))

If you are not too picky about exactly which model you use, you can submit a list of models that are known to be "good enough". If there is no source for the first model, it looks for the next, and so on.

client, name = providers.connect_to_first_available_model([
    "mistral-7b", "llama-2-7b", "gpt-3.5", "gemini-nano"])

It's fine to include models from paid subscription services in the list. They will be skipped if the user does not have access, but used opportunistically if keys are found.

If you are even less picky, connect_to_ai() will connect you to the "best" model available, which is determined by a list of popular models in auto_model_priority, at the bottom of the JSON file.

client, name = providers.connect_to_ai()

Connections are tried in the order they appear in inference_providers.json, and the first one that works is returned. There is also a flag choose_randomly=True that does what it sounds like. If you'd rather sort through the options yourself, find_model_providers() will give you a list of all services that are compatible with a given model.

Handling updates

The JSON file bundled with the package will slowly get out of date, as providers and models come and go over time. There are a couple of ways to deal with that:

  • setting auto_update=True on the ProviderList constructor (like the example code) will download and use the latest version of the JSON checked in to GitHub
  • you can also maintain your own up-to-date or edited copy, and use it by specifying json_override at startup
  • you can fetch the latest JSON with yourself using ProviderList.get_updated_provider_list() and do as you will

You can also use json_merge in the constructor to patch user settings on top of the embedded file. Use the same schema, but sparse: only include items to merge/override. For example, to support a new model without updating the package:

{
    "providers": {
        "openai": {
            "model_names": {        
                "gpt-5": "gpt-5-instruct-preview"
            }
        }
    }
}

This is the most robust solution. Look inside inference_providers.json for details.

from inference_providers import ProviderList
providers = ProviderList(json_merge=open('my_custom_config.json').read())

client, name = providers.connect_to_model("gpt-5")
print(providers.get_response(client, name, "How long do we have?"))

Notes

API cheat sheet

A model is normally considered "working" if we can connect to the server, but the methods also have a test option that runs an inference sample to verify the API key is authorized and a valid response comes back. It is disabled by default because the extra check makes initialization slower, but enable it if you can for more reliable connections.

class ProviderList(verbose=False, json_override=None, json_merge=None, auto_update=False)

    # Get a list of unique model names available
    get_canonical_model_names()

    # Scan ports on the local machine for servers/proxies
    detect_local_connections(self, refresh_cache=False)

    # Get a list of provider connection options for a model
    find_model_providers(canonical_name)

    # Send a query to an existing client, blocking until the response arrives
    get_response(client, model_name, prompt)

    # Connect to a specific model using a provider with known support
    connect_to_model(canonical_name, choose_randomly=False, test=False)

    # Connect to a model from a (prioritized) list of acceptable ones
    connect_to_first_available_model(model_names, test=False)

    # Connect to the "best" model available given the user's API keys
    connect_to_ai(test=True)

Canonical model names recognized

gpt-3.5 llama-2-7b mistral-7b pplx-7b falcon-40b
gpt-4 llama-2-13b mistral-tiny pplx-70b dolphin-8x7b
gpt-4-32k llama-2-70b mixtral-8x7b deepseek-34b wizard-70b
gpt-4-turbo codellama-7b mistral-small phind-34b openchat-7b
gemini-pro codellama-13b mistral-medium yi-34b openhermes-7b
claude-2 codellama-34b mistral-next qwen-72b
codellama-70b

This list is arbitrary and basically represents the models I happen to have tested.

The "instruct" versions of models will be used if they exist. Failing that, any "chat" versions. The -instruct and -chat suffixes on model names are left off the canonical names, but the presence of one of them is always implied. No base/completion models are specified, only ones you can talk to.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inference-providers-0.2.1.tar.gz (15.2 kB view hashes)

Uploaded Source

Built Distribution

inference_providers-0.2.1-py3-none-any.whl (12.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page