A content-addressable file management system.
HashFS is a content-addressable file management system. What does that mean? Simply, that HashFS manages a directory where files are saved based on the file’s hash.
Typical use cases for this kind of system are ones where:
Install using pip:
pip install hashfs
from hashfs import HashFS
Designate a root folder for HashFS. If the folder doesn’t already exist, it will be created.
# Set the `depth` to the number of subfolders the file's hash should be split when saving. # Set the `width` to the desired width of each subfolder. fs = HashFS('temp_hashfs', depth=4, width=1, algorithm='sha256') # With depth=4 and width=1, files will be saved in the following pattern: # temp_hashfs/a/b/c/d/efghijklmnopqrstuvwxyz # With depth=3 and width=2, files will be saved in the following pattern: # temp_hashfs/ab/cd/ef/ghijklmnopqrstuvwxyz
NOTE: The algorithm value should be a valid string argument to hashlib.new().
HashFS supports basic file storage, retrieval, and removal as well as some more advanced features like file repair.
Add content to the folder using either readable objects (e.g. StringIO) or file paths (e.g. 'a/path/to/some/file').
from io import StringIO some_content = StringIO('some content') address = fs.put(some_content) # Or if you'd like to save the file with an extension... address = fs.put(some_content, '.txt') # The id of the file (i.e. the hexdigest of its contents). address.id # The absolute path where the file was saved. address.abspath # The path relative to fs.root. address.relpath # Whether the file previously existed. address.is_duplicate
Get a file’s HashAddress by address ID or path. This address would be identical to the address returned by put().
assert fs.get(address.id) == address assert fs.get(address.relpath) == address assert fs.get(address.abspath) == address assert fs.get('invalid') is None
Get a BufferedReader handler for an existing file by address ID or path.
fileio = fs.open(address.id) # Or using the full path... fileio = fs.open(address.abspath) # Or using a path relative to fs.root fileio = fs.open(address.relpath)
NOTE: When getting a file that was saved with an extension, it’s not necessary to supply the extension. Extensions are ignored when looking for a file based on the ID or path.
Delete a file by address ID or path.
fs.delete(address.id) fs.delete(address.abspath) fs.delete(address.relpath)
NOTE: When a file is deleted, any parent directories above the file will also be deleted if they are empty directories.
Below are some of the more advanced features of HashFS.
The HashFS files may not always be in sync with it’s depth, width, or algorithm settings (e.g. if HashFS takes ownership of a directory that wasn’t previously stored using content hashes or if the HashFS settings change). These files can be easily reindexed using repair().
repaired = fs.repair() # Or if you want to drop file extensions... repaired = fs.repair(extensions=False)
WARNING: It’s recommended that a backup of the directory be made before repairing just in case something goes wrong.
Instead of actually repairing the files, you can iterate over them for custom processing.
for corrupted_path, expected_address in fs.corrupted(): # do something
WARNING: HashFS.corrupted() is a generator so be aware that modifying the file system while iterating could have unexpected results.
Iterate over files.
for file in fs.files(): # do something # Or using the class' iter method... for file in fs: # do something
Iterate over folders that contain files (i.e. ignore the nested subfolders that only contain folders).
for folder in fs.folders(): # do something
Compute the size in bytes of all files in the root directory.
total_bytes = fs.size()
Count the total number of files.
total_files = fs.count() # Or via len()... total_files = len(fs)
For more details, please see the full documentation at http://hashfs.readthedocs.org.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|File Name & Checksum SHA256 Checksum Help||Version||File Type||Upload Date|
|hashfs-0.7.0-py2.py3-none-any.whl (14.6 kB) Copy SHA256 Checksum SHA256||py2.py3||Wheel||Apr 20, 2016|
|hashfs-0.7.0.tar.gz (23.3 kB) Copy SHA256 Checksum SHA256||–||Source||Apr 20, 2016|