Skip to main content

Sqlite based cache for python projects

Project description

dinkycache for python projects

Code style: black

A very small and flexible name/value cache for python.

Intended for quick set up, in development and small scale projects.

Uses sqlite for storage and lzstring for compression.

Stores any data that can be parsed in to a string with json.dumps() and json.loads()
Returns int, dict and str just fine, but returns a list if supplied a tuple

Install

From pip

python -m pip install dinkycache

From github

# Copy the 'dinkycache' directory and 
# requirements.txt into your cwd
# Install dependencies:
python -m pip install -r requirements.txt

How to use

Import

from dinkycache import Dinky

3 main methods called like so:

Dinky().read(str: id)
Dinky().write(str: id, str:data, float:ttl)
Dinky().delete(str: id) -> int

Examples with default settings

from dinkycache import Dinky

#gets data from some slow source
def fetch_data(id):
    return "some data"

id = "001"

Then where you would normaly write:

results = fetch_data(id)

Write these two lines instead:

if (result := Dinky().read(id) == False):
    Dinky().write(id, result := fetch_data(id))

If you are running Python < 3.8 or just don't like walruses:

results = Dinky().read(id)
if results == False:
    results = fetch_data(id)
    Dinky().write(id, results)

This is also an option, its there, its fully supported, however not further documented:

    #Write:
    d = Dinky()
    d.id = "test"
    d.data = {"whatever": "floats"}
    d.setTTL(24) #hr
    d.write()
    print(d.data)

    #Read:
    d = Dinky()
    d.id = "test
    results = d.read()
    print(results)

In either case results will contain the data from cache if its there and within the specified TTL. Or it will call your get_some_data() to try and fetch the data instead.

Settings

Avaialble settings and default values

    dbfile: str = "dinkycache.db",  # name of sqlite file
    ttl: float = 2160,              # time to live in hours, default 2160 = 90 days, 0 = no expiry
    purge_rows: bool = True,        # will enforce row_limit if true
    row_limit: int = 10000,         # maximum number of rows in db
    row_overflow: int = 1000,       # buffer zone above row_limit before anything is deleted
    clean_expired: bool = True,     # will delete outdated entries if true
    clean_hrs: int = 24,            # time between cleanups of expried entries
    clean_iterations: int = 100,    # iterations (reads/writes) between cleanups

Set them in one of the following ways

# Positional arguments:
Dinky('preferred.db', 24).read(id)

OR

# Keyword arguments:
Dinky(dbfile='preferred.db').read(id)

OR

# Unpack list as positional arguments:
settings = ['preferred.db', 24]
Dinky(*settings).read(id)

OR

# Unpack dict as keyword arguments:
settings = {
    'dbfile' = 'preferred.db',
    'ttl' = 24,
}
Dinky(**settings).read(id)

Examples of use with user-defined settings

You can destruct a dict an pass it as settings each time you invoke Dinky(**settings), or assign the new Dinky object to a variable and re-use it that way:

Invoke on every use:

settings = {
    'dbfile' = 'preferred.db',
    'purge_rows' = True,
    'clean_expired' = False,
    'row_limit' = 100,
    'ttl' = 0,
}

if (result := Dinky(**settings).read(id) == False):
    Dinky(**settings).write(id, result := fetch_data(id))

Retain Dinky object:

d = Dinky(
    dbfile = 'preferred.db',
    purge_rows = True,
    clean_expired = False,
    row_limit = 100,
    ttl = 0,
)

if (result := d.read(id) == False):
    d.write(id, result := fetch_data(id))

clean_expired, clean_hrs and clean_iterations

If clean_expired = True, script will try to clean out expired entries every time data is written if one of the following conditions are met.
It has been minimum clean_hrs: int = 24 hours since last cleanup
OR
There have been more than clean_iterations: int = 100 calls since last cleanup

The cleanup function comes at a 75% performance cost, so if it runs on every 100 write, that amounts to a 7.5% average performance cost.

clean_expired might therefore be a much better alternative than using purge_rows for larger amounts of data.

purge_rows, row_limit and row_overflow

If purge_rows = True, script will try to clean out overflowing lines every time data is written.
row_limit = int sets the maximum lines in the database.
row_overflow = int how many lines over row_limit before row_limitis enforced

This comes at a great performance cost for larger databases. 462 ms to sort 100k rows on a 1.8 ghz Intel Core i5. For this reason row_overflow is added as a buffer threshold, so that deletion dont happen on every call to .write.

It is probably best used for small databases and/or databases with small entries.

Public methods

.read()

Arguments id (string, required)
Returns data corresponding to id, or False if there is no data
Can be called without arguments on existing object if id has alredy been set.

.write()

Arguments id (string, required), data (string, required), tll (int, optional)
Stores the supplied data to that id, tll can be set here if not already passed on invocation
Returns the hashed id or False
Will do clean_expiredand purge_rows if they are set True
Can be called without arguments on existing object if id and data has alredy been set.

.delete()

Arguments id (string, required)
Deletes entry corresponding to that id
Returns number of rows deletet, 1 or 0.
Can be called without arguments on existing object if id has alredy been set.

Setting TTL to seconds, months etc:

Its been a bit of a discussion whats the most sensible choice of default time unit for TTL, we landed on hours as a float.

The idea is that hours will be sufficient and sensible enough in most usercases. However it also allows for a workaround if you need to set lower or higher values:

    10 seconds:
    Dinky().write(id = "1", data = "some data", ttl = 10 / 3600)
    10 minutes:
    Dinky().write(id = "1", data = "some data", ttl = 10 / 60)
    10 months:
    Dinky().write(id = "1", data = "some data", ttl = 10 * 720)
    10 years:
    Dinky().write(id = "1", data = "some data", ttl = 10 * 8760)

Alternatively you can expiry in seconds directly on the Dinky object like so

    d = Dinky()
    d.expires = 10 + d.now
    d.write(id = "1", data = "some data")

Performance

This wont ever compete with Redis, MongoDB or anything like that. This is ment to be a small, easy solution for small scale use cases where you dont want or need any big dependencies. Hence performance will be less, but might still be orders of magnitude faster than repeatedly parsing the data from some website.

Tests:

Reads from DB 1

10k entries of 40 to 1500 characters:

1 read = 0.018 to 0.003 sec
100 reads = 0.6 sec (0.006 avg)

Reads from DB 2

10k entries of 40 to 150000 characters:
1 read = 0.003 to 0.022 sec
100 reads = 1.1 to 2.4 sec (0.015 avg)

Test DB 3:

38.1mb: 100k writes str_len 40~1500: avg 11.3ms (incl generation)

10k reads: 6.57 ms avg 

Security

Ids are hashed, so you may put anything in there Data is compressed to a string of base 64 characters, so you may put anything in there.

Lzstring seem to have very high integrity, we have not been able to produce a test result where the input and output has not been equal.

That said, what you put in is what you'll get out. There is no checking for html-tags and such. Just something to bevare of if for some reason you'll use it to store and later display user provided data.

Compression

Lzstring is not great for shorter strings, and does sometimes even increase to string lenght. However in testing we found that short strings (80 to 1500 chars) have an average compression rate of 98%, while strings longer than 60000 characters have an average compression rate of 48%. Testing was done with random as well as real world data.

So there is most likely some performance loss, but it is outweighed by smaller database files and the fact that base 64 strings makes life very easy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dinkycache-1.0.3.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dinkycache-1.0.3-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file dinkycache-1.0.3.tar.gz.

File metadata

  • Download URL: dinkycache-1.0.3.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for dinkycache-1.0.3.tar.gz
Algorithm Hash digest
SHA256 5d6fbbcccc96da76bf4d4feaff36746ab538a782214a37b427d7779e70a9d067
MD5 23ee0c458de93f24a87e661414d6c411
BLAKE2b-256 322c5016485628ed6e6367e33ca79165ea9d6da3c5bbaf301575351ae104cc57

See more details on using hashes here.

File details

Details for the file dinkycache-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: dinkycache-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for dinkycache-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 8f7e9327649ab529f0cd44352c988d65425072cc0020c3971f34a58a0116907f
MD5 59cf4f9652a2e03de62441802dfc4de8
BLAKE2b-256 73c556036b7054347ef4cd101913ca4845948ffcf5cf6fb3a50245c0461c65e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page