Skip to main content

Estimate costs and running times of complex LLM workflows/experiments/pipelines in advance before spending money, via simulations.

Project description

costly

Estimate costs and running times of complex LLM workflows/experiments/pipelines in advance before spending money, via simulations. Just put @costly() on the load-bearing function; make sure all functions that call it pass **kwargs to it and call your complex function with simulate=True and some cost_log: Costlog object. See examples.ipynb for more details.

(Actually you don't have to pass cost_log and simulate throughout; you can just define a global cost_log and simulate at the top of the file that contains your @costly functions and pass them as defaults to your @costly functions.)

(Pass @costly(disable_costly=True) to disable costly for a function. E.g. you may set disable_costly to be a global variable and pass it to your @costly decorators.)

https://github.com/abhimanyupallavisudhir/costly

Installation

pip install costly

Usage

See examples.ipynb for a full walkthrough; some examples below.

from costly import Costlog, costly, CostlyResponse
from costly.estimators.llm_api_estimation import LLM_API_Estimation as estimator


@costly()
def chatgpt(input_string: str, model: str) -> str:
    from openai import OpenAI

    client = OpenAI()
    response = client.chat.completions.create(
        model=model, messages=[{"role": "user", "content": input_string}]
    )
    output_string = response.choices[0].message.content
    return output_string


@costly(
    input_tokens=lambda kwargs: LLM_API_Estimation.messages_to_input_tokens(
        kwargs["messages"], kwargs["model"]
    ),
)
def chatgpt_messages(messages: list[dict[str, str]], model: str) -> str:
    from openai import OpenAI

    client = OpenAI()
    response = client.chat.completions.create(model=model, messages=messages)
    output_string = response.choices[0].message.content
    return output_string


@costly()
def chatgpt(input_string: str, model: str) -> str:
    from openai import OpenAI

    client = OpenAI()
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "user", "content": input_string},
        ],
    )

    return CostlyResponse(
        output=response.choices[0].message.content,
        cost_info={
            "input_tokens": response.usage.prompt_tokens,
            "output_tokens": response.usage.completion_tokens,
        },
    ) # in usage, this will still just return the output, not the whole CostlyResponse object

Testing

poetry run pytest -s -m "not slow"
poetry run pytest -s -m "slow"

Tests for instructor currently fail.

TODO

  • Make it work with async
  • Decide and document what the best way to "propagate" description (for breakdown purposes) through function calls is. Have the user manually write def f(...): ... g(description = kwargs.get("description") + ["f"]? Add a @description("blabla") decorator? Add a @description decorator that automatically appends the function name and arguments into description?
  • Better solution for token counting for Chat messages (search HACK in the repo)
  • make instructor tests pass https://community.openai.com/t/how-to-calculate-the-tokens-when-using-function-call/266573/11
  • Support for locally run LLMs -- ideally need a cost & time estimator that takes into account your machine details, GPU pricing etc.
  • support more models

Instructor tests don't really pass but I can kinda live with this. Lmk if anyone has a good way to count tokens from messages that includes tool calling (I'm using this).

FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[PERSONINFO_gpt-4o] - AssertionError: ['Input tokens estimate 65 not within 20pc of truth 83']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[PERSONINFO_gpt-4o-mini] - AssertionError: ['Input tokens estimate 65 not within 20pc of truth 83']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[PERSONINFO_gpt-4-turbo] - AssertionError: ['Input tokens estimate 65 not within 20pc of truth 85']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[PERSONINFO_gpt-3.5-turbo] - AssertionError: ['Input tokens estimate 65 not within 20pc of truth 85']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[FOOMODEL_gpt-4o] - AssertionError: ['Input tokens estimate 77 not within 20pc of truth 108']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[FOOMODEL_gpt-4o-mini] - AssertionError: ['Input tokens estimate 77 not within 20pc of truth 108']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[FOOMODEL_gpt-4-turbo] - AssertionError: ['Input tokens estimate 77 not within 20pc of truth 113']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[FOOMODEL_gpt-3.5-turbo] - AssertionError: ['Input tokens estimate 77 not within 20pc of truth 113']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[BARMODEL_gpt-4o] - AssertionError: ['Input tokens estimate 70 not within 20pc of truth 168']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[BARMODEL_gpt-4o-mini] - AssertionError: ['Input tokens estimate 70 not within 20pc of truth 168']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[BARMODEL_gpt-4-turbo] - AssertionError: ['Input tokens estimate 70 not within 20pc of truth 178']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[BARMODEL_gpt-4] - AssertionError: ['Input tokens estimate 70 not within 20pc of truth 126']
FAILED tests/test_estimators/test_llm_api_estimation.py::test_estimate_contains_exact_instructor[BARMODEL_gpt-3.5-turbo] - AssertionError: ['Input tokens estimate 70 not within 20pc of truth 178']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

costly-2.2.0.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

costly-2.2.0-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file costly-2.2.0.tar.gz.

File metadata

  • Download URL: costly-2.2.0.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.2 Linux/6.1.0-26-cloud-amd64

File hashes

Hashes for costly-2.2.0.tar.gz
Algorithm Hash digest
SHA256 649a7ae84e341555abb4c602e6b4dfaf6d088753a7dad7a598e1adb67f6eaabf
MD5 78b2e1daece632a3004c491b039b5e54
BLAKE2b-256 55a15bd722ba12379674a8a8227339ed13a4a0911577a120c5458aa6b7e589ef

See more details on using hashes here.

File details

Details for the file costly-2.2.0-py3-none-any.whl.

File metadata

  • Download URL: costly-2.2.0-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.2 Linux/6.1.0-26-cloud-amd64

File hashes

Hashes for costly-2.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1eac3fdfe573308fcadfd0012b5f3b4808670ef48c69b1ad75ad89faa86bb5ee
MD5 3505b88f55345e8023b51788358c08b0
BLAKE2b-256 a2825b5c5f907d54c5f7d30d7350013d134f5e4a71887a6d14a1c9b87f19ea25

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page