Skip to main content

Copies files to filenames based on their contents

Project description

A command line tool that copies files to filenames based on their contents. It also writes a map of what was renamed to what, so you can find your files.

Main purpose of this is that you can add a far future Expires header to your components. Using hash based filenames is a lot better than using your $VCS revision number, because users only need to download files that didn’t change.

Creating some source files

For this demo, we’ll create a few files that will be used throughout the whole process:

>>> system("mkdir maps/")
>>> system("mkdir input/")
>>> with open("input/foo.txt", "w") as file:
...     file.write("foo")

We also create files that live in a sub- and subsubdirectories:

>>> system("mkdir input/subdir/")
>>> with open("input/subdir/bar.txt", "w") as writeme:
...     writeme.write("bar")
>>> system("mkdir input/subdir/2nd/")
>>> with open("input/subdir/2nd/baz.txt", "w") as writeme:
...     writeme.write("foofoofoo")

Simple usage

>>> system("hashedassets maps/map.txt input/*.txt input/*/*.txt output/")
mkdir 'output'
cp 'input/foo.txt' 'output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt'
cp 'input/subdir/bar.txt' 'output/Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt'
>>> system("ls maps/")
map.txt
>>> print open("maps/map.txt").read()
subdir/bar.txt: Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt
foo.txt: C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt
<BLANKLINE>
>>> system("ls output/")
C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt
Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt

Modification time is also preserved:

>>> old_stat = os.stat("input/foo.txt")
>>> new_stat = os.stat("output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt")
>>> [(getattr(old_stat, prop) == getattr(new_stat, prop))
...   for prop in ('st_mtime', 'st_atime', 'st_ino',)]
[True, True, False]

We can easily do this with multiple formats:

JavaScript

>>> system("hashedassets -n my_callback maps/map.js input/*.txt input/*/*.txt output/")
cp 'input/foo.txt' 'output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt'
cp 'input/subdir/bar.txt' 'output/Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt'
>>> print open("maps/map.js").read()
var my_callback = {
  "foo.txt": "C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt",
  "subdir/bar.txt": "Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt"
};

JSON

>>> system("hashedassets -n my_callback maps/map.json input/*.txt input/*/*.txt output/")
cp 'input/foo.txt' 'output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt'
cp 'input/subdir/bar.txt' 'output/Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt'
>>> print open("maps/map.json").read()
{
  "foo.txt": "C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt",
  "subdir/bar.txt": "Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt"
}

JSONP

>>> system("hashedassets -n my_callback maps/map.jsonp input/*.txt input/*/*.txt output/")
cp 'input/foo.txt' 'output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt'
cp 'input/subdir/bar.txt' 'output/Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt'
>>> print open("maps/map.jsonp").read()
my_callback({
  "foo.txt": "C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt",
  "subdir/bar.txt": "Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt"
});

Sass

Sass is a meta language on top of CSS.

>>> system("hashedassets -n my_callback maps/map.scss input/*.txt input/*/*.txt output/")
cp 'input/foo.txt' 'output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt'
cp 'input/subdir/bar.txt' 'output/Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt'
>>> print open("maps/map.scss").read()
@mixin my_callback($directive, $path) {
         @if $path == "subdir/bar.txt" { #{$directive}: url("Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt"); }
    @else if $path == "foo.txt" { #{$directive}: url("C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt"); }
    @else {
      @warn "Did not find "#{$path}" in list of assets";
      #{$directive}: url($path);
    }
}

PHP

>>> system("hashedassets -n my_callback maps/map.php input/*.txt input/*/*.txt output/")
cp 'input/foo.txt' 'output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt'
cp 'input/subdir/bar.txt' 'output/Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt'
>>> print open("maps/map.php").read()
$my_callback = array(
  "subdir/bar.txt" => "Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt",
  "foo.txt" => "C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt",
)

Sed

We can also generate a sed script that does the replacements for us:

>>> system("hashedassets -n my_callback maps/map.sed input/*.txt input/*/*.txt output/")
cp 'input/foo.txt' 'output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt'
cp 'input/subdir/bar.txt' 'output/Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt'
>>> print open("maps/map.sed").read()
s/subdir\/bar\.txt/Ys23Ag_5IOWqZCw9QGaVDdHwH00\.txt/g
s/foo\.txt/C-7Hteo_D9vJXQ3UfzxbwnXaijM\.txt/g
<BLANKLINE>

We should also be able to use this directly with sed

>>> with open("replaceme.html", "w") as writeme:
...     writeme.write('<a href=foo.txt>bar</a>')

The script is then applied like this:

>>> system("sed -f maps/map.sed replaceme.html")
<a href=C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt>bar</a>

However, ‘.’ is not treated as wildcard, so the following does not work

>>> with open("replaceme2.html", "w") as writeme:
...     writeme.write('<a href=fooAtxt>bar</a>')
>>> system("sed -f maps/map.sed replaceme2.html")
<a href=fooAtxt>bar</a>

Specifying the type with -t

The type of the map is guessed from the filename, but you can specify it as well:

>>> system("hashedassets -t js cantguessmaptype input/*.txt input/*/*.txt output/")
cp 'input/foo.txt' 'output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt'
cp 'input/subdir/bar.txt' 'output/Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt'

Specifying the length of the filename with -l

>>> system("hashedassets -l 10 maps/shortmap.json input/*.txt input/*/*.txt output/")
cp 'input/foo.txt' 'output/C-7Hteo_D9.txt'
cp 'input/subdir/bar.txt' 'output/Ys23Ag_5IO.txt'
>>> system("rm output/C-7Hteo_D9.txt output/Ys23Ag_5IO.txt")

Specifying the digest with -d

Hashedassets uses sha1 by default to hash the input files. You can change that with the -d command line parameter, e.g. by specifying -d md5 to use the md5 digest method.

>>> system("hashedassets -d md5 maps/shortmap.json input/*.txt input/*/*.txt output/")
cp 'input/foo.txt' 'output/rL0Y20zC-Fzt72VPzMSk2A.txt'
cp 'input/subdir/bar.txt' 'output/N7UdGUp1E-RbVvZSTy1R8g.txt'
>>> system("rm output/rL0Y20zC-Fzt72VPzMSk2A.txt output/N7UdGUp1E-RbVvZSTy1R8g.txt")

Keep the directory structure with –keep-dirs

By default hashedassets copies all output files into the root level of the output dir. You can turn this off, with the ‘’–keep-dirs’’ option:

>>> system("hashedassets --keep-dirs maps/preserve.json input/*.txt input/*/*.txt input/*/*/*.txt output/")
cp 'input/foo.txt' 'output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt'
mkdir -p output/subdir
cp 'input/subdir/bar.txt' 'output/subdir/Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt'
mkdir -p output/subdir/2nd
cp 'input/subdir/2nd/baz.txt' 'output/subdir/2nd/NdbmnXyjdY2paFzlDw9aJzCKH9w.txt'
>>> system("rm -r output/subdir/")

Verbose mode with -v

If we tell the command to be quiet, it does not print what it is doing:

>>> system("hashedassets -q maps/map2.txt input/*.txt input/*/*.txt output/")

If we tell the command to be more verbose, it logs more information:

>>> system("hashedassets -vvv maps/map3.txt input/*.txt input/*/*.txt output/")
Debug level set to 10
cp 'input/foo.txt' 'output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt'
cp 'input/subdir/bar.txt' 'output/Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt'

Re-using a map

The program reads in maps it created in a prior run to only copy files that haven’t changed since. So, the following commands do not copy any files:

>>> system("hashedassets maps/map.scss input/*.txt input/*/*.txt output/")
>>> system("hashedassets maps/map.php input/*.txt input/*/*.txt output/")
>>> system("hashedassets maps/map.js input/*.txt input/*/*.txt output/")
>>> system("hashedassets maps/map.json input/*.txt input/*/*.txt output/")
>>> system("hashedassets maps/map.sed input/*.txt input/*/*.txt output/")
>>> system("hashedassets maps/map.jsonp input/*.txt input/*/*.txt output/")
>>> system("hashedassets maps/map.txt input/*.txt input/*/*.txt output/")

If we touch one of the input files in between, the file will be read but not copied because the hashsum is the same:

>>> system('touch -t200504072214.12 input/foo.txt')
>>> system("hashedassets maps/map.json input/*.txt input/*/*.txt output/")

If we change the file’s content, it will get a new name:

>>> with open("input/foo.txt", "w") as writeme:
...     writeme.write("foofoo")
>>> system("hashedassets maps/map.json input/*.txt input/*/*.txt output/")
rm 'output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt'
cp 'input/foo.txt' 'output/QIDaFD7KLKQh0l5O6b8exdew3b0.txt'

If you then list the files in the directory, note that the old file ‘’output/C-7Hteo_D9vJXQ3UfzxbwnXaijM.txt’’ is gone:

>>> system("ls output/")
QIDaFD7KLKQh0l5O6b8exdew3b0.txt
Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt

If we remove one of the created files, it gets recreated:

>>> system("rm output/Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt")
>>> system("hashedassets maps/map.json input/*.txt input/*/*.txt output/")
cp 'input/subdir/bar.txt' 'output/Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt'
>>> system("ls output/")
QIDaFD7KLKQh0l5O6b8exdew3b0.txt
Ys23Ag_5IOWqZCw9QGaVDdHwH00.txt

If a file that is about to be removed because the original content changed, it isn’t recreated:

>>> system("rm output/QIDaFD7KLKQh0l5O6b8exdew3b0.txt")
>>> with open("input/foo.txt", "w") as writeme:
...     writeme.write("foofoofoo")
>>> system("hashedassets maps/map.json input/*.txt input/*/*.txt output/")
cp 'input/foo.txt' 'output/NdbmnXyjdY2paFzlDw9aJzCKH9w.txt'

Error handling

If try to use the software with no arguments the user is reminded to specify at least the mapfile, the source and the destination directory:

>>> system("hashedassets", external=True)
Usage: hashedassets [ options ] MAPFILE SOURCE [...] DEST
<BLANKLINE>
hashedassets: error: You need to specify at least MAPFILE SOURCE and DEST

If the user specifies the –help option, detailed usage information is shown:

>>> system("hashedassets --help", external=True)
Usage: hashedassets [ options ] MAPFILE SOURCE [...] DEST
<BLANKLINE>
Version: ...
<BLANKLINE>
Options:
  --version             show program's version number and exit
  -h, --help            show this help message and exit
  -v, --verbose         increase verbosity level
  -q, --quiet           don't print status messages to stdout
  -n MAPNAME, --map-name=MAPNAME
                        name of the map [default: hashedassets]
  -t MAPTYPE, --map-type=MAPTYPE
                        type of the map. one of scss, php, js, json, sed,
                        jsonp, txt [default: guessed from MAPFILE]
  -l LENGTH, --digest-length=LENGTH
                        length of the generated filenames (without extension)
                        [default: 27]
  -d HASHFUN, --digest=HASHFUN
                        hash function to use. One of sha1, md5 [default: sha1]
  -k, --keep-dirs       Mirror SOURCE dir structure to DEST [default: false]

Project details


Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page