Skip to main content

Cache ML -- layer on top of joblib to cache parsed datasets, dramatically reducing load time of large data files. Also supports encryption at rest.

Project description

Cache ML – layer on top of joblib to cache parsed datasets, dramatically reducing load time of large data files. Also supports encryption at rest. Currently supported backends are local filesystem and S3.

Example Usage

Here is an example from a Jupyter notebook:

import pandas as pd
from cacheml.cache import LocalFile, Cache
cache = Cache()
@cache.cache # this function's result will be cached
def read_and_filter_commits(commits_file_obj):
    return pd.read_csv(commits_file_obj.path)
ts_all = read_and_filter_commits(LocalFile(commits.csv.gz))

Performance Test Results

There are from running the unit tests which simulate loading the time series data from datahut.ai, which is in a 216MB compressed csv file. The first case just loads into a dataframe, while the second case does some additional processing (sorting, removing entries outside a time range).

Caching results from unit test, raw dataframes

File location

Time for raw df read

Time for initial read and caching of file

Time for cached read

Local File

134.0

130.9

0.41

S3

153.6

144.6

0.38

Caching results from unit test, procesed dataframes

File location

Time for original function

Time for initial read and caching of file

Time for cached read

Local File

139.6

142.49

1.04

S3

153.4

155.8

0.99

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

CacheML-1.0.4.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

CacheML-1.0.4-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file CacheML-1.0.4.tar.gz.

File metadata

  • Download URL: CacheML-1.0.4.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for CacheML-1.0.4.tar.gz
Algorithm Hash digest
SHA256 fcde1402bd2cb546d85ff5fcf5957dcab53de443c942ca7f1eeb2c3fd5ee74ef
MD5 8cbaa95fae00d46c2126b593c893d78c
BLAKE2b-256 f91bdb28818d9f3963b57399001c03d237715556911074758e611f847b3182bb

See more details on using hashes here.

File details

Details for the file CacheML-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: CacheML-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for CacheML-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d42c16f0dbe88d758c08b24b11abc0164e6070ebb96dca4bd518aea1def765a7
MD5 9e5c134727979501d7182a36bbe1ff13
BLAKE2b-256 abcddd083bca88b8f0fe1b9db628313ce00eeffe0f02d2ae85b2ef9a5022214b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page