Skip to main content

Cache ML -- layer on top of joblib to cache parsed datasets, dramatically reducing load time of large data files. Also supports encryption at rest.

Project description

Cache ML – layer on top of joblib to cache parsed datasets, dramatically reducing load time of large data files. Also supports encryption at rest. Currently supported backends are local filesystem and S3.

Example Usage

Here is an example from a Jupyter notebook:

import pandas as pd
from cacheml.cache import LocalFile, Cache
cache = Cache()
@cache.cache # this function's result will be cached
def read_and_filter_commits(commits_file_obj):
    return pd.read_csv(commits_file_obj.path)
ts_all = read_and_filter_commits(LocalFile(commits.csv.gz))

Performance Test Results

There are from running the unit tests which simulate loading the time series data from datahut.ai, which is in a 216MB compressed csv file. The first case just loads into a dataframe, while the second case does some additional processing (sorting, removing entries outside a time range).

Caching results from unit test, raw dataframes

File location

Time for raw df read

Time for initial read and caching of file

Time for cached read

Local File

134.0

130.9

0.41

S3

153.6

144.6

0.38

Caching results from unit test, procesed dataframes

File location

Time for original function

Time for initial read and caching of file

Time for cached read

Local File

139.6

142.49

1.04

S3

153.4

155.8

0.99

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CacheML-1.0.4.tar.gz (12.5 kB view hashes)

Uploaded Source

Built Distribution

CacheML-1.0.4-py3-none-any.whl (13.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page