Skip to main content

lightweigth function decorators to cache your `pandas.DataFrame` as feather.

Project description

federleicht

PyPI - Python Version PyPI - Version PyPI - Downloads PyPI - License GitHub - Pytest GitHub - Page GitHub - Release pre-commit


Federleicht is a Python package providing a cache decorator for pandas.DataFrame, utilizing the lightweight and efficient pyarrow feather file format.

federleicht.cache_dataframe is designed to decorate functions that return pandas.DataFrame objects. The decorator saves the DataFrame to a feather file on the first call and loads it automatically on subsequent calls if the file exists.

Key Features

  • Feather Integration: Save and load pandas.DataFrame effortlessly using the Feather format, known for its speed and simplicity.
  • Decorator Simplicity: Add caching functionality to your functions with a single decorator line.
  • Efficient Caching: Avoid redundant computations by reusing cached results.

Cache Expiry

To implement cache expiry, federleicht requires all arguments of the decorated function to be serializable. The cache will expire under the following conditions:

  • Argument Sensitivity: Cache will expire if the arguments (args / kwargs) of the decorated function change.
  • When a os.PathLike object is passed as an argument, the cache will expire if the file size and / or modification time changes.
  • Code Change Detection: Cache will expire if the implementation / code of the decorated function changes during development.
  • Time-based Expiry: Cache will expire when it is older than a given timedelta.
  • In addition to the immutable built-in data types, the following types for arguments are supported:
    • os.PathLike
    • pandas.DataFrame
    • pandas.Series
    • numpy.ndarray
    • datetime.datetime
    • types.FunctionType

Installation

Install federleicht from PyPI:

pip install federleicht

Usage

Here's a quick example:

import pandas as pd
from federleicht import cache_dataframe


@cache_dataframe
def generate_large_dataframe():
    # Simulate a heavy computation
    return pd.DataFrame({"col1": range(10000), "col2": range(10000)})


df = generate_large_dataframe()

Benchmark

Static Badge

  • file: Eartquakes-1990-2023.csv
  • size: 494.8 mb
  • lines: 3,445,752

Functions which are used to benchmark the performance of the cache_dataframe decorator.

def read_data(file: str, **kwargs) -> pd.DataFrame:
    """
    Read the earthquake dataset from a CSV file to Benchmark cache.

    Perform some data type conversions and return the DataFrame.
    """
    df = pd.read_csv(
        file,
        header=0,
        dtype={
            "status": "category",
            "tsunami": "boolean",
            "data_type": "category",
            "state": "category",
        },
        **kwargs,
    )

    df["time"] = pd.to_datetime(df["time"], unit="ms")
    df["date"] = pd.to_datetime(df["date"], format="mixed")

    return df

The pandas.DataFrame without the attrs dictionary will be cached in the .pandas_cache directory and will only expire if the file changes. For more details, see the Cache Expiry section.

@cache_dataframe
def read_cache(file: pathlib.Path, **kwargs) -> pd.DataFrame:
    return read_data(file, **kwargs)

Benchmark Results

Results strongly depend on the system configuration and the file system. The following results are obtained on:

  • OS: Windows
  • OS Version: 10.0.19044
  • Python: 3.11.9
  • CPU: AMD64 Family 23 Model 104 Stepping 1, AuthenticAMD
nrows read_data [s] build_cache [s] read_cache [s]
10000 0.060 0.076 0.037
32170 0.172 0.193 0.033
103493 0.536 0.569 0.067
332943 1.658 1.791 0.143
1071093 5.383 5.465 0.366
3445752 16.750 17.720 1.141

BenchmarkPlot

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

federleicht-0.0.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

federleicht-0.0.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file federleicht-0.0.0.tar.gz.

File metadata

  • Download URL: federleicht-0.0.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.9 Windows/10

File hashes

Hashes for federleicht-0.0.0.tar.gz
Algorithm Hash digest
SHA256 b4941b28500387b3880aa617815089321b270ac6776a4496866bef5f1985fcdb
MD5 bb64385ab0ad7b98363f5f16b0fbfa2c
BLAKE2b-256 56d3100ee510e3ce46e2f15fc89b3c9f08f404dfbc9954275804a78fce9179f0

See more details on using hashes here.

File details

Details for the file federleicht-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: federleicht-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.9 Windows/10

File hashes

Hashes for federleicht-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 11b64e5924112f3cf86e0b6f701a1278b2b982a55fb0657eb90e53253880b068
MD5 514a8d688e36eddbfd01e4894091021f
BLAKE2b-256 fb608698c14595b931dc3a51548c2bc014c3eb50ba8d6186785c8f281c454c45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page