Skip to main content

Pythonic parameterized cache paths.

Project description

https://img.shields.io/pypi/v/cachepath.svg https://img.shields.io/travis/haydenflinner/cachepath.svg Documentation Status

A small package for pythonic parameterized cache paths.

Getting Started

Install: pip install cachepath

Import: from cachepath import CachePath, TempPath, Path

Docs: ReadTheDocs | API doc is here

Why?
  1. Integrates pathlib with tempfile.gettempdir and shutil.rmtree by providing TempPath and Path.rm():

    path = TempPath()
    path.rm()
    # or would you rather..
    path = None
    with tempfile.NamedTemporaryFile(delete=False) as f:
        path = Path(f.name)
    # Only now can we use Path. If we tried using it within the With
    # (for example for path.read_text()), we'd break on Windows
    path.unlink()  # only if file, doesn't work on folders
  2. Wraps pathlib import for Py2/3 compat. (not in six!):

    from cachepath import Path
    # or
    try: from pathlib import Path; except ImportError: from pathlib2 import Path
  3. Provides CachePath, which lets you quickly get a parameterized temp filename, with all folders automatically created:

    r = CachePath(date, userid, 'expensive_results.txt')
    assert (r == Path('/tmp/', date, userid, 'expensive_results.txt')
            and r.parent.exists())
    r.rm()  # File remove
    r.parent.rm()  # Symmetric with folder remove!
    
    # Without cachepath
    p = Path(tempfile.gettempdir(), date, userid, 'expensive_results.txt').
    # Don't update timestamp if it already exists so that we don't cause
    # Make-like tools to always think something's changed
    if not p.parent.exists():
        p.parent.mkdir(parents=True, exist_ok=True)
    
    p.unlink()  # Why is it .unlink() instead of .remove()?
    # Why .remove and .unlink, but mkdir instead of createdir?
    p.parent.remove()
    # .remove() might throw because there was another file in the folder,
    # but we didn't care, they're tempfiles!
    import shutil
    shutil.rmtree(p.parent)

Why, but longer:

Do you need a temp path to pass to some random tool for its logfile? Behold, a gaping hole in pathlib:

import tempfile
import os
try: from pathlib import Path; except ImportError: from pathlib2 import Path
def get_tempfile():
    fd, loc = tempfile.mkstemp()
    os.close(fd)  # If we forgot do this, it would stay open until process exit
    return Path(loc)

# Easier way
from cachepath import TempPath
def get_tempfile():
    return TempPath()  # Path('/tmp/213kjdsrandom')

But this module is called cachepath, not temppath, what gives?

Suppose I’m running that same imaginary tool pretty often, but I’d like to skip running it if I already have results for a certain day. Just sticking some identifying info into a filename should be good enough. Something like Path('/tmp/20181204_toolresults.txt')

# try: from pathlib import Path; except ImportError: from pathlib2 import Path
# We'll cheat a little to get py2/3 compat without so much ugliness
from cachepath import Path
import tempfile
def get_tempfile(date):
    filename = '{}_toolresults.txt'.format(date)
    return Path(tempfile.gettempdir(), filename)

# Easier to do this...
from cachepath import CachePath
def get_tempfile(date):
    return CachePath(date, suffix='.txt')

Not bad, but not great. But our requirements changed, let’s go a step further.

Now I’m running this tool a lot, over a tree of data that looks like this:

2018-12-23
    person1
    person2
2018-12-24
    person1
2018-12-25
    person1

I want my logs to be structured the same way. How hard can it be?

2018-12-23/
    person1_output.txt
    person2_output.txt
2018-12-24/
    person1_output.txt
2018-12-25/
    person1_output.txt

Let’s find out:

# Let's get the easy way out of the way first :)
def get_path(date, person):
    return CachePath(date, person, suffix='_output.txt')
    # Automatically ensures /tmp/date/ exists when we create the CachePath!

# Now the hard way
def get_path(date, person):
    personfilename = '{p}_output.txt'.format(p=person)
    returning = Path(tempfile.gettempdir())/date/personfilename
    # Does this mkdir update the modified timestamp of the folders we're in?
    # Might matter if we're part of a larger toolset...
    returning.parent.mkdir(exist_ok=True, parents=True)
    return returning

Suppose we hadn’t remembered to make the $date/ folders. When we passed the Path out to another tool, or tried to .open it, we may have gotten a Permission Denied error on Unix systems rather than the “File/Folder not found” you might expect. With CachePath, this can’t happen. Creating a CachePath implicitly creates all of the preceding directories necessary for your file to exist.

Now, suppose we found a bug in this external tool we were using and we’re going to re-run it for a day. How do we clear out that day’s results so that we can be sure we’re looking at fresh output from the tool? Well, with CachePath, it’s just:

def easy_clear_date(date):
    CachePath(date).clear()  # rm -r /tmp/date/*

But if you don’t have cachepath, you’ll find that most Python libs play it pretty safe when it comes to files. Path.remove() requires the folder to be empty, and doesn’t provide a way to empty the folder. Not to mention, what if our results folders had special permissions, or was actually a symlink, and we had write access but not delete? Oh well, let’s see what we can do:

def hard_clear_date(date):
    # We happen to know that date is a folder and not a file (at least in our
    # current design), so we know we need some form of .remove() rather than
    # .unlink(). Unfortunately, pathlib doesn't offer one for folders with
    # files still in them. If you google how to do it, you will find plenty of
    # answers, one of which is a pure pathlib recursive solution! But we're lazy,
    # so lets bring in yet another module:
    p = Path(tempfile.gettempdir(), date)
    import shutil
    if p.exists():
        shutil.rmtree(p)
    p.mkdir(exist_ok=True, parents=True)
    # This still isn't exactly equivalent to CachePath.clear(), because we've
    # lost whatever permissions were set on the date folder, and if it were
    # actually a symlink to somewhere else, that's gone now.

Convinced yet? pip install cachepath or copy the source into your local utils.py (you know you have one.)

API doc is here.

By the way, as a side effect of importing cachepath, all Paths get the ability to do rm() and clear().

Shameless Promo

Find yourself working with paths a lot in cmd-line tools? You might like invoke and/or magicinvoke!

History

1.0.0 (2018-12-08)

  • Big doc updates. 1.0.0 to symbolize SemVer adherence.

0.1.0 (2018-12-08)

  • First release on PyPI. Adds CachePath, TempPath, Path.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cachepath-1.1.1.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

cachepath-1.1.1-py2.py3-none-any.whl (8.0 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file cachepath-1.1.1.tar.gz.

File metadata

  • Download URL: cachepath-1.1.1.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for cachepath-1.1.1.tar.gz
Algorithm Hash digest
SHA256 a9a511a5c2e12f1b0ed53111446851e8843bd895a1055356a179a679a9f21bcc
MD5 ea7ba79bbccd06842a46d9717fe32375
BLAKE2b-256 47dae1cf0731f9a5086e4278096093bab00d7754242394cb18e1bbedf31bf7aa

See more details on using hashes here.

File details

Details for the file cachepath-1.1.1-py2.py3-none-any.whl.

File metadata

  • Download URL: cachepath-1.1.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.1 setuptools/40.6.2 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3

File hashes

Hashes for cachepath-1.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8a92ec61cc2bc40401bda656b47bdb3b16234e500357ff66edd8f8a2c7524b7c
MD5 6b3a26605814ffbeaa4a7aaf1a581b73
BLAKE2b-256 fbfd24affbd7937f761c2b7e8afb587651907a191998d3ba1a7c28b6b1d03e02

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page