Skip to main content

A lightweight caching library for polars

Project description

polars_cache

A lightweight, lazy, disc-based cache for Polars LazyFrames.

Usage

import polars as pl
import polars_cache as pc

lf = pl.LazyFrame({"x" : range(100)})

def very_expensive(col: str):
    pl.col(col).pow(2).exp().sqrt()

query = (
    lf
    .with_columns(very_expensive("x"))
    .pipe(pc.cache_to_disc, max_age=120) # set up cache
)

df1 = query.collect()  # populate the cache
df2 = query.collect()  # second invocation will be much faster!

# do some downstream computation
another_query = query.with_columns(y = pl.col("x") + 7)

df3 = another_query.collect() # this will use the cache!

Updating a source will cause the cache to refresh:

import os

query_from_a_file = (
    pl.scan_parquet("data.parquet")
    .group_by("age", "sex")
    .agg(pl.len())
    .pipe(pc.cache_to_disc, check_sources=True)
)

_ = query_from_a_file.collect() # populate cache
result = query_from_a_file.collect() # load from cache

os.utime("data.parquet")  # update source timestamp
new_result = query_from_a_file.collect() # cache is invalid -- will refresh

⚠️ Warning ⚠️

This function is opaque to the Polars optimizer and will split your query into two chunks: one before the cache statment and one after. Each query will be independently optimzed by Polars, but optimizations (e.g. projection and predicate pushdown) will NOT be able to cross the cache barrier. Use with caution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_cache-1.0.4.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polars_cache-1.0.4-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file polars_cache-1.0.4.tar.gz.

File metadata

  • Download URL: polars_cache-1.0.4.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for polars_cache-1.0.4.tar.gz
Algorithm Hash digest
SHA256 d2ae6d838a8dd006396dfd753544b0f8f6106e75b6ea8162a04fcd57072b8a04
MD5 eb8b9e02adf9a1ab170ba99b06315c39
BLAKE2b-256 2aac2d291bfe76cf1ab0b67bc11bb76823306135128d57224660f53587a6c4d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_cache-1.0.4.tar.gz:

Publisher: cicd.yml on alipatti/polars_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file polars_cache-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: polars_cache-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for polars_cache-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ef35b0bcba75d28425ce917bba189ded333d5ebe36fc015a240fa8988be88289
MD5 9fc9657695b3008b87978c069b2dd7d8
BLAKE2b-256 add4a1dbc8041bb6d9b1697348ca67bfee5886f37e625580f9ee262812d8e62b

See more details on using hashes here.

Provenance

The following attestation bundles were made for polars_cache-1.0.4-py3-none-any.whl:

Publisher: cicd.yml on alipatti/polars_cache

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page