Cache your API calls with a single line of code. No mocks, no fixtures. Just faster, cleaner code.
Project description
cachy
We often call APIs while prototyping and testing our code. A single API call (e.g. an Anthropic chat completion) can take 100’s of ms to run. This can really slow down development especially if our notebook contains many API calls 😞.
cachy caches API requests. It does this by saving the result of each
call to a local cachy.jsonl file. Before calling an API (e.g. OpenAI)
it will check if the request exists in cachy.jsonl. If it does it will
return the cached result.
How does it work?
Under the hood popular SDK’s like OpenAI, Anthropic and LiteLLM use
httpx.Client and httpx.AsyncClient.
cachy patches the send method of both clients and injects a simple
caching mechanism:
- create a cache key from the request
- if the key exists in
cachy.jsonlreturn the cached response - if not, call the API and save the response to
cachy.jsonl
Usage
To use cachy
- install the package:
pip install pycachy - add the snippet below to the top of your notebook
from cachy import enable_cachy
enable_cachy()
By default cachy will cache requests made to OpenAI, Anthropic, Gemini
and DeepSeek.
Note: Gemini caching only works via the LiteLLM SDK.
[!NOTE]
Custom APIs
If you’re using the OpenAI or LiteLLM SDK for other LLM providers like Grok, Mistral you can cache these requests as shown below.
from cachy import enable_cachy, doms enable_cachy(doms=doms+('api.x.ai', 'api.mistral.com'))
Docs
Docs can be found hosted on this GitHub repository’s pages.
How to use
First import and enable cachy
from cachy import enable_cachy
enable_cachy()
Now run your api calls as normal.
from openai import OpenAI
cli = OpenAI()
r = cli.responses.create(model="gpt-4.1", input="Hey!")
r
Hey! How can I help you today? 😊
- id: resp_68b9978ecec48196aa3e77b09ed41c6403f00c61bc19c097
- created_at: 1756993423.0
- error: None
- incomplete_details: None
- instructions: None
- metadata: {}
- model: gpt-4.1-2025-04-14
- object: response
- output: [ResponseOutputMessage(id=‘msg_68b9978f9f70819684b17b0f21072a9003f00c61bc19c097’, content=[ResponseOutputText(annotations=[], text=‘Hey! How can I help you today? 😊’, type=‘output_text’, logprobs=[])], role=‘assistant’, status=‘completed’, type=‘message’)]
- parallel_tool_calls: True
- temperature: 1.0
- tool_choice: auto
- tools: []
- top_p: 1.0
- background: False
- conversation: None
- max_output_tokens: None
- max_tool_calls: None
- previous_response_id: None
- prompt: None
- prompt_cache_key: None
- reasoning: Reasoning(effort=None, generate_summary=None, summary=None)
- safety_identifier: None
- service_tier: default
- status: completed
- text: ResponseTextConfig(format=ResponseFormatText(type=‘text’), verbosity=‘medium’)
- top_logprobs: 0
- truncation: disabled
- usage: ResponseUsage(input_tokens=9, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=11, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=20)
- user: None
- store: True
If you run the same request again it will read it from the cache.
r = cli.responses.create(model="gpt-4.1", input="Hey!")
r
Hey! How can I help you today? 😊
- id: resp_68b9978ecec48196aa3e77b09ed41c6403f00c61bc19c097
- created_at: 1756993423.0
- error: None
- incomplete_details: None
- instructions: None
- metadata: {}
- model: gpt-4.1-2025-04-14
- object: response
- output: [ResponseOutputMessage(id=‘msg_68b9978f9f70819684b17b0f21072a9003f00c61bc19c097’, content=[ResponseOutputText(annotations=[], text=‘Hey! How can I help you today? 😊’, type=‘output_text’, logprobs=[])], role=‘assistant’, status=‘completed’, type=‘message’)]
- parallel_tool_calls: True
- temperature: 1.0
- tool_choice: auto
- tools: []
- top_p: 1.0
- background: False
- conversation: None
- max_output_tokens: None
- max_tool_calls: None
- previous_response_id: None
- prompt: None
- prompt_cache_key: None
- reasoning: Reasoning(effort=None, generate_summary=None, summary=None)
- safety_identifier: None
- service_tier: default
- status: completed
- text: ResponseTextConfig(format=ResponseFormatText(type=‘text’), verbosity=‘medium’)
- top_logprobs: 0
- truncation: disabled
- usage: ResponseUsage(input_tokens=9, input_tokens_details=InputTokensDetails(cached_tokens=0), output_tokens=11, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=20)
- user: None
- store: True
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycachy-0.0.4.tar.gz.
File metadata
- Download URL: pycachy-0.0.4.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dae01df82f58b90bd1973048b737272ca142fb64dffdfeaf7675354e9e4235e
|
|
| MD5 |
88e7773170204dd232f2375e43e5b10c
|
|
| BLAKE2b-256 |
aa90a92c00156167c30825ead8e7c21fad5d87daa5c23ff78d49ae9c424318be
|
File details
Details for the file pycachy-0.0.4-py3-none-any.whl.
File metadata
- Download URL: pycachy-0.0.4-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
044d5e98f37d5967c51f98e0573db33ccf98833c14e722e3aaa8b8cc33e48799
|
|
| MD5 |
ed0e7cef8265c00917efb71d46c59c86
|
|
| BLAKE2b-256 |
4f732c45987161c9680882d703d888c2739e0afa1130a9b9a476289d001a6d96
|