Skip to main content

Cache your API calls with a single line of code. No mocks, no fixtures. Just faster, cleaner code.

Project description

cachy

We often call APIs while prototyping and testing our code. A single API call (e.g. an Anthropic chat completion) can take 100’s of ms to run. This can really slow down development especially if our notebook contains many API calls 😞.

cachy caches API requests. It does this by saving the result of each call to a local cachy.jsonl file. Before calling an API (e.g. OpenAI) it will check if the request exists in cachy.jsonl. If it does it will return the cached result.

How does it work?

Under the hood popular SDK’s like OpenAI, Anthropic and LiteLLM use httpx.Client and httpx.AsyncClient.

cachy patches the send method of both clients and injects a simple caching mechanism:

  • create a cache key from the request
  • if the key exists in cachy.jsonl return the cached response
  • if not, call the API and save the response to cachy.jsonl

Usage

To use cachy

  • install the package: pip install pycachy
  • add the snippet below to the top of your notebook
from cachy import enable_cachy

enable_cachy()

By default cachy will cache requests made to OpenAI, Anthropic, Gemini and DeepSeek.

Note: Gemini caching only works via the LiteLLM SDK.

[!NOTE]

Custom APIs

If you’re using the OpenAI or LiteLLM SDK for other LLM providers like Grok, Mistral you can cache these requests as shown below.

from cachy import enable_cachy, doms
enable_cachy(doms=doms+('api.x.ai', 'api.mistral.com'))

Docs

Docs can be found hosted on this GitHub repository’s pages.

How to use

First import and enable cachy

from cachy import enable_cachy
enable_cachy()

Now run your api calls as normal.

from openai import OpenAI
cli = OpenAI()
r = cli.responses.create(model="gpt-4.1", input="Hey!")
r

Hey! How can I help you today? 😊

  • id: resp_68b9978ecec48196aa3e77b09ed41c6403f00c61bc19c097
  • created_at: 1756993423.0
  • error: None
  • incomplete_details: None
  • instructions: None
  • metadata: {}
  • model: gpt-4.1-2025-04-14
  • object: response
  • output: [ResponseOutputMessage(id=‘msg_68b9978f9f70819684b17b0f21072a9003f00c61bc19c097’, content=[ResponseOutputText(annotations=[], text=‘Hey! How can I help you today? 😊’, type=‘output_text’, logprobs=[])], role=‘assistant’, status=‘completed’, type=‘message’)]
  • parallel_tool_calls: True
  • temperature: 1.0
  • tool_choice: auto
  • tools: []
  • top_p: 1.0
  • background: False
  • conversation: None
  • max_output_tokens: None
  • max_tool_calls: None
  • previous_response_id: None
  • prompt: None
  • prompt_cache_key: None
  • reasoning: Reasoning(effort=None, generate_summary=None, summary=None)
  • safety_identifier: None
  • service_tier: default
  • status: completed
  • text: ResponseTextConfig(format=ResponseFormatText(type=‘text’), verbosity=‘medium’)
  • top_logprobs: 0
  • truncation: disabled
  • usage: ResponseUsage(input_tokens=9, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=11, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=20)
  • user: None
  • store: True

If you run the same request again it will read it from the cache.

r = cli.responses.create(model="gpt-4.1", input="Hey!")
r

Hey! How can I help you today? 😊

  • id: resp_68b9978ecec48196aa3e77b09ed41c6403f00c61bc19c097
  • created_at: 1756993423.0
  • error: None
  • incomplete_details: None
  • instructions: None
  • metadata: {}
  • model: gpt-4.1-2025-04-14
  • object: response
  • output: [ResponseOutputMessage(id=‘msg_68b9978f9f70819684b17b0f21072a9003f00c61bc19c097’, content=[ResponseOutputText(annotations=[], text=‘Hey! How can I help you today? 😊’, type=‘output_text’, logprobs=[])], role=‘assistant’, status=‘completed’, type=‘message’)]
  • parallel_tool_calls: True
  • temperature: 1.0
  • tool_choice: auto
  • tools: []
  • top_p: 1.0
  • background: False
  • conversation: None
  • max_output_tokens: None
  • max_tool_calls: None
  • previous_response_id: None
  • prompt: None
  • prompt_cache_key: None
  • reasoning: Reasoning(effort=None, generate_summary=None, summary=None)
  • safety_identifier: None
  • service_tier: default
  • status: completed
  • text: ResponseTextConfig(format=ResponseFormatText(type=‘text’), verbosity=‘medium’)
  • top_logprobs: 0
  • truncation: disabled
  • usage: ResponseUsage(input_tokens=9, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=11, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=20)
  • user: None
  • store: True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycachy-0.0.5.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycachy-0.0.5-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file pycachy-0.0.5.tar.gz.

File metadata

  • Download URL: pycachy-0.0.5.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for pycachy-0.0.5.tar.gz
Algorithm Hash digest
SHA256 1e1d97ac9ba23bace8f29a2e2d37ac2e71034e40d3cec9e2bb50344cc5f36e5b
MD5 7df650124c903dfa0b43af902babfe25
BLAKE2b-256 0934070f7c7fcdda915a8e18d843b5ec1568fb247f9685e294e8d1d1d901ba25

See more details on using hashes here.

File details

Details for the file pycachy-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: pycachy-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.5

File hashes

Hashes for pycachy-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ebc6b53e06d8d485d660025dcd8bd9e2d4e540661a27cdd7c72ab234b09a504a
MD5 b1647c33541e0ac18b13768f6e3f3512
BLAKE2b-256 bfd05b7e1c813e29ca2ae0e54d5db33dee807e0036b975bdc1e31fccff1b10af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page