Skip to main content

A function-based LLM protocol and wrapper.

Project description

fnllm

A generic LLM wrapper that provides a function protocol for LLM implementations. An OpenAI wrapper is provided.

Getting Started

pip install fnllm

Overview

fnllm is an LLM wrapper that provides function-based protocols for accessing LLM functionality (e.g. fnllm.types.ChatLLM, fnllm.types.EmbeddingsLLM). It's designed to be provider-agnostic, but it currently uses OpenAI as the default provider.

⚠️ fnllm is a research grade library used by Microsoft Research. It changes rapidly, and although we try to adhere to Semantic Versioning, there may be occasional unintended breaking changes. If you use fnllm, we recommend pinning your client version and validating new versions.

Chain of Responsibility

A key feature of fnllm is that it hides several key concerns behind a chain of responsibility abstraction in order to ensure fast and durable data-processing jobs. These concerns include retrying, throttling, caching and json recovery.

The chain of responsibility uses Python decorators to decorate the raw LLM invocation. At a high level, the decorator stack looks like this:

flowchart TB
    client
    llm
    client --> json(Json Recovery)
    json --> cache(Caching)
    cache --> retry(Retrying)
    retry --> throttle(Throttling)
    throttle --> llm(((LLM)))

Request Lifecycle

To understand the lifecycle of an fnllm request in more detail, we'll break this down into the inbound and outbound sides of a request.

flowchart TB
    client
    llm((("LLM (5)")))
    jsonin("Json Inbound (noop) (1)")
    jsonout("Json Outbound (9)")
    cachein("Cache Inbound (2)")
    cacheout("Cache Outbound (8)")
    retryin("Retry Inbound (3)")
    retryout("Retry Outbound (7)")
    throttlein("Throttle Inbound (4)")
    throttleout("Throttle Outbound (noop) (6)")
    client --> jsonin
    jsonin --> cachein
    cachein --> retryin
    retryin --> throttlein
    throttlein --> llm
    llm --> throttleout
    throttleout --> retryout
    retryout --> cacheout
    cacheout --> jsonout
    jsonout --> client

As a client fires off a request, the request makes it way through the decorator stack. Each decorator is responsible for a specific concern, and they all work together to ensure that the request is processed correctly.

Initial Entry

The first decorator a request encounters is the Json Recovery decorator (1), which has no inbound behavior. The first active decorator is the Cache decorator (2), which will check if the request is already cached. If it is, the cached response will be returned immediately, bypassing the rest of the decorator stack. It is important that the Cache decorator is the first active inbound decorator, as this ensures we have speedy cache reads when performing fully-cached data runs.

Live Request Execution

If a request has not been handled by the Cache decorator, it will process as a live request. There are a couple of key concerns we need to address: we need to ensure that we don't exceed the rate limits of the LLM provider, and we need to ensure that we can handle any errors that occur during the request. We want our retry logic to adhere to our model's rate-limit capacity, so the rate limiting is applied closest to the LLM. The Retry decorator (3) wraps the rest of the chain with a Retry strategy (e.g. exponential backoff, linear incremental, randomized). Finally, closest to the LLM, the request is handled by the Throttle decorator (4), which will ensure that the request is sent at a rate that is acceptable to the LLM provider. Finally, the request will be sent to the LLM (5).

Live Response Handling

Once we receive an LLM response, it will be returned through the stack in reverse order. The Throttle decorator (6) has no outbound behavior, as it only applies to inbound requests. In case of errors, the Retry decorator (7) will attempt to re-drive the request according to the retry policy. Upon a successful request, the Cache decorator (8) will write the response into the cache.

Final Orchestration & Redriving

Finally, the Json Recovery decorator (9) will attempt to parse the LLM response as JSON and interpret it as the given Pydantic model (if provided). If the response is malformed, or if it does not adhere to the Pydantic model, we will attempt a recovery. Depending on the Json Receiver strategy, it will either attempt to clean up the malformed JSON text or re-drive the LLM call.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fnllm-0.4.1.tar.gz (93.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fnllm-0.4.1-py3-none-any.whl (79.3 kB view details)

Uploaded Python 3

File details

Details for the file fnllm-0.4.1.tar.gz.

File metadata

  • Download URL: fnllm-0.4.1.tar.gz
  • Upload date:
  • Size: 93.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fnllm-0.4.1.tar.gz
Algorithm Hash digest
SHA256 80a7450693691bf0832e12a2d70420647bfea35a43cb91c4a9cb5e2f39172b50
MD5 ebd03924aff806bff88978601763f1ba
BLAKE2b-256 3384bc3d02134a46dd267afbed66a47dc281b252bd8171c94ad22bcc8f924f8b

See more details on using hashes here.

Provenance

The following attestation bundles were made for fnllm-0.4.1.tar.gz:

Publisher: python-publish.yml on microsoft/essex-toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fnllm-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: fnllm-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 79.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fnllm-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 22f1b3316a90f29fde94bfe651e0e4963ff68cddb438035ef7c2161e39789ccf
MD5 ed1f78c0202a34ac03246e79ebb98ffa
BLAKE2b-256 ac6a04db92a7e8d9cf9b73d3c29c38e16d5728069ec1be06a4723f74579499fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for fnllm-0.4.1-py3-none-any.whl:

Publisher: python-publish.yml on microsoft/essex-toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page