A function-based LLM protocol and wrapper.
Project description
fnllm
A generic LLM wrapper that provides a function protocol for LLM implementations. An OpenAI wrapper is provided.
Getting Started
pip install fnllm
Overview
fnllm is an LLM wrapper that provides function-based protocols for accessing LLM functionality (e.g. fnllm.types.ChatLLM, fnllm.types.EmbeddingsLLM). It's designed to be provider-agnostic, but it currently uses OpenAI as the default provider.
⚠️
fnllmis a research grade library used by Microsoft Research. It changes rapidly, and although we try to adhere to Semantic Versioning, there may be occasional unintended breaking changes. If you usefnllm, we recommend pinning your client version and validating new versions.
Chain of Responsibility
A key feature of fnllm is that it hides several key concerns behind a chain of responsibility abstraction in order to ensure fast and durable data-processing jobs. These concerns include retrying, throttling, caching and json recovery.
The chain of responsibility uses Python decorators to decorate the raw LLM invocation. At a high level, the decorator stack looks like this:
flowchart TB
client
llm
client --> json(Json Recovery)
json --> cache(Caching)
cache --> retry(Retrying)
retry --> throttle(Throttling)
throttle --> llm(((LLM)))
Request Lifecycle
To understand the lifecycle of an fnllm request in more detail, we'll break this down into the inbound and outbound sides of a request.
flowchart TB
client
llm((("LLM (5)")))
jsonin("Json Inbound (noop) (1)")
jsonout("Json Outbound (9)")
cachein("Cache Inbound (2)")
cacheout("Cache Outbound (8)")
retryin("Retry Inbound (3)")
retryout("Retry Outbound (7)")
throttlein("Throttle Inbound (4)")
throttleout("Throttle Outbound (noop) (6)")
client --> jsonin
jsonin --> cachein
cachein --> retryin
retryin --> throttlein
throttlein --> llm
llm --> throttleout
throttleout --> retryout
retryout --> cacheout
cacheout --> jsonout
jsonout --> client
As a client fires off a request, the request makes it way through the decorator stack. Each decorator is responsible for a specific concern, and they all work together to ensure that the request is processed correctly.
Initial Entry
The first decorator a request encounters is the Json Recovery decorator (1), which has no inbound behavior. The first active decorator is the Cache decorator (2), which will check if the request is already cached. If it is, the cached response will be returned immediately, bypassing the rest of the decorator stack. It is important that the Cache decorator is the first active inbound decorator, as this ensures we have speedy cache reads when performing fully-cached data runs.
Live Request Execution
If a request has not been handled by the Cache decorator, it will process as a live request. There are a couple of key concerns we need to address: we need to ensure that we don't exceed the rate limits of the LLM provider, and we need to ensure that we can handle any errors that occur during the request. We want our retry logic to adhere to our model's rate-limit capacity, so the rate limiting is applied closest to the LLM. The Retry decorator (3) wraps the rest of the chain with a Retry strategy (e.g. exponential backoff, linear incremental, randomized). Finally, closest to the LLM, the request is handled by the Throttle decorator (4), which will ensure that the request is sent at a rate that is acceptable to the LLM provider. Finally, the request will be sent to the LLM (5).
Live Response Handling
Once we receive an LLM response, it will be returned through the stack in reverse order. The Throttle decorator (6) has no outbound behavior, as it only applies to inbound requests. In case of errors, the Retry decorator (7) will attempt to re-drive the request according to the retry policy. Upon a successful request, the Cache decorator (8) will write the response into the cache.
Final Orchestration & Redriving
Finally, the Json Recovery decorator (9) will attempt to parse the LLM response as JSON and interpret it as the given Pydantic model (if provided). If the response is malformed, or if it does not adhere to the Pydantic model, we will attempt a recovery. Depending on the Json Receiver strategy, it will either attempt to clean up the malformed JSON text or re-drive the LLM call.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fnllm-0.4.1.tar.gz.
File metadata
- Download URL: fnllm-0.4.1.tar.gz
- Upload date:
- Size: 93.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80a7450693691bf0832e12a2d70420647bfea35a43cb91c4a9cb5e2f39172b50
|
|
| MD5 |
ebd03924aff806bff88978601763f1ba
|
|
| BLAKE2b-256 |
3384bc3d02134a46dd267afbed66a47dc281b252bd8171c94ad22bcc8f924f8b
|
Provenance
The following attestation bundles were made for fnllm-0.4.1.tar.gz:
Publisher:
python-publish.yml on microsoft/essex-toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fnllm-0.4.1.tar.gz -
Subject digest:
80a7450693691bf0832e12a2d70420647bfea35a43cb91c4a9cb5e2f39172b50 - Sigstore transparency entry: 417530025
- Sigstore integration time:
-
Permalink:
microsoft/essex-toolkit@5c701b06075175ef2c74b7c1866a739c211c899b -
Branch / Tag:
refs/tags/fnllm-v0.4.1 - Owner: https://github.com/microsoft
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@5c701b06075175ef2c74b7c1866a739c211c899b -
Trigger Event:
release
-
Statement type:
File details
Details for the file fnllm-0.4.1-py3-none-any.whl.
File metadata
- Download URL: fnllm-0.4.1-py3-none-any.whl
- Upload date:
- Size: 79.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22f1b3316a90f29fde94bfe651e0e4963ff68cddb438035ef7c2161e39789ccf
|
|
| MD5 |
ed1f78c0202a34ac03246e79ebb98ffa
|
|
| BLAKE2b-256 |
ac6a04db92a7e8d9cf9b73d3c29c38e16d5728069ec1be06a4723f74579499fa
|
Provenance
The following attestation bundles were made for fnllm-0.4.1-py3-none-any.whl:
Publisher:
python-publish.yml on microsoft/essex-toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fnllm-0.4.1-py3-none-any.whl -
Subject digest:
22f1b3316a90f29fde94bfe651e0e4963ff68cddb438035ef7c2161e39789ccf - Sigstore transparency entry: 417530036
- Sigstore integration time:
-
Permalink:
microsoft/essex-toolkit@5c701b06075175ef2c74b7c1866a739c211c899b -
Branch / Tag:
refs/tags/fnllm-v0.4.1 - Owner: https://github.com/microsoft
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@5c701b06075175ef2c74b7c1866a739c211c899b -
Trigger Event:
release
-
Statement type: