lightweigth function decorators to cache your `pandas.DataFrame` as feather.
Project description
federleicht
Federleicht is a Python package providing a cache decorator for pandas.DataFrame, utilizing the lightweight and efficient pyarrow feather file format.
federleicht.cache_dataframe is designed to decorate functions that return pandas.DataFrame objects. The decorator saves the DataFrame to a feather file on the first call and loads it automatically on subsequent calls if the file exists.
Key Features
- Feather Integration: Save and load
pandas.DataFrameeffortlessly using the Feather format, known for its speed and simplicity. - Decorator Simplicity: Add caching functionality to your functions with a single decorator line.
- Efficient Caching: Avoid redundant computations by reusing cached results.
Cache Expiry
To implement cache expiry, federleicht requires all arguments of the decorated function to be serializable. The cache will expire under the following conditions:
- Argument Sensitivity: Cache will expire if the arguments (
args/kwargs) of the decorated function change. - When a
os.PathLikeobject is passed as an argument, the cache will expire if the file size and / or modification time changes. - Code Change Detection: Cache will expire if the implementation / code of the decorated function changes during development.
- Time-based Expiry: Cache will expire when it is older than a given
timedelta. - In addition to the immutable built-in data types, the following types for arguments are supported:
os.PathLikepandas.DataFramepandas.Seriesnumpy.ndarraydatetime.datetimetypes.FunctionType
Installation
Install federleicht from PyPI:
pip install federleicht
Usage
Here's a quick example:
import pandas as pd
from federleicht import cache_dataframe
@cache_dataframe
def generate_large_dataframe():
# Simulate a heavy computation
return pd.DataFrame({"col1": range(10000), "col2": range(10000)})
df = generate_large_dataframe()
Benchmark
- file: Eartquakes-1990-2023.csv
- size: 494.8 mb
- lines: 3,445,752
Functions which are used to benchmark the performance of the cache_dataframe decorator.
def read_data(file: str, **kwargs) -> pd.DataFrame:
"""
Read the earthquake dataset from a CSV file to Benchmark cache.
Perform some data type conversions and return the DataFrame.
"""
df = pd.read_csv(
file,
header=0,
dtype={
"status": "category",
"tsunami": "boolean",
"data_type": "category",
"state": "category",
},
**kwargs,
)
df["time"] = pd.to_datetime(df["time"], unit="ms")
df["date"] = pd.to_datetime(df["date"], format="mixed")
return df
The pandas.DataFrame without the attrs dictionary will be cached in the .pandas_cache directory and will only expire if the file changes. For more details, see the Cache Expiry section.
@cache_dataframe
def read_cache(file: pathlib.Path, **kwargs) -> pd.DataFrame:
return read_data(file, **kwargs)
Benchmark Results
Results strongly depend on the system configuration and the file system. The following results are obtained on:
- OS: Windows
- OS Version: 10.0.19044
- Python: 3.11.9
- CPU: AMD64 Family 23 Model 104 Stepping 1, AuthenticAMD
| nrows | read_data [s] | build_cache [s] | read_cache [s] |
|---|---|---|---|
| 10000 | 0.060 | 0.076 | 0.037 |
| 32170 | 0.172 | 0.193 | 0.033 |
| 103493 | 0.536 | 0.569 | 0.067 |
| 332943 | 1.658 | 1.791 | 0.143 |
| 1071093 | 5.383 | 5.465 | 0.366 |
| 3445752 | 16.750 | 17.720 | 1.141 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file federleicht-0.0.0.tar.gz.
File metadata
- Download URL: federleicht-0.0.0.tar.gz
- Upload date:
- Size: 9.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.11.9 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4941b28500387b3880aa617815089321b270ac6776a4496866bef5f1985fcdb
|
|
| MD5 |
bb64385ab0ad7b98363f5f16b0fbfa2c
|
|
| BLAKE2b-256 |
56d3100ee510e3ce46e2f15fc89b3c9f08f404dfbc9954275804a78fce9179f0
|
File details
Details for the file federleicht-0.0.0-py3-none-any.whl.
File metadata
- Download URL: federleicht-0.0.0-py3-none-any.whl
- Upload date:
- Size: 11.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.11.9 Windows/10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11b64e5924112f3cf86e0b6f701a1278b2b982a55fb0657eb90e53253880b068
|
|
| MD5 |
514a8d688e36eddbfd01e4894091021f
|
|
| BLAKE2b-256 |
fb608698c14595b931dc3a51548c2bc014c3eb50ba8d6186785c8f281c454c45
|