A concise description of your package.
Project description
LLMTrack is a Python package designed to streamline the usage of language models, especially during batch generation. It offers features for easy loading of models, generation caching to optimize performance, detailed logging, and continuous token usage recording on a per-model basis. Whether you're working on research or deploying models in production, EfficientLLM helps you manage and monitor your models efficiently.
Installation
pip install llmtrack
LLM Loading
from llmtrack import get_llm
llm = get_llm(model_name="openai/gpt-4o-mini")
print(llm.generate("Generate ONLY a random word"))
Public LLM APIs are specified by simply specifying model_name consisting of API providers and model names. The supported APIs include :
- OpenAI, e.g., "openai/xxxx" (xxxx should be replaced by specific model names)
- The environment variable has to be setup:
OPENAI_API_KEY - Specific model names: See the document
- The environment variable has to be setup:
- Azure OpenAI, e.g., "azure_openai/chatgpt-4k"
- The three environment variables have to be setup:
AZURE_OPENAI_ENDPOINT,AZURE_OPENAI_API_KEY,AZURE_OPENAI_API_VERSION - Ask providers for specific model names
- The three environment variables have to be setup:
- MoonShot, e.g., "moonshot/moonshot-v1-8k"
- The environment variable has to be setup:
MOONSHOT_API_KEY
- The environment variable has to be setup:
Caching
from llmtrack import get_llm
llm = get_llm(model_name="openai/gpt-4o-mini", cache=True)
print(llm.generate("Generate ONLY a random word "))
After running the code above, the generation cache will be stored in cahe_llmtrack/openai/gpt-4o-mini, following the naming rule cahe_llmtrack/{API provider}/{model name}.
If you invoke the same model with the same prompt, the cache will be used.
Note: You can verify this by checking whether token usage increases with the next function: Token Usage Tracking.
Token Usage Tracking
from llmtrack import get_llm
llm = get_llm("openai/gpt-4o-mini", cache=True, token_usage=True)
print(llm.generate("Generate ONLY a random word "))
print(llm.generate("Generate ONLY a random word "))
Let's track the token usage at the ./gpt-4o-mini_token_usage.json. Only one record exists, although we invoke the LLM twice.
{"prompt": 17, "completion": 4, "total": 21, "time": "2024-08-25 14:02:36"}
Logging (TBA)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.