Filesystem operations to index files and hash their contents

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

File Ops

File Ops is a set of components and basic CLI to get file statistics from a hard drive. It stores this information in a SQLite database, which you can then query to your heart's content.

With it, you can

Store all of the files and directories of a drive with
- file/directory path
- file/directory modified at
- file/directory size
- file md5 hash

With that information you can (quickly)

Find largest files, directories, etc across many external drives.
Find duplicate files by comparing hashes
Find the largest directories

Commands

All commands support a --time flag to output how long it took.

Index File Stats (no hash is calculated for files)

Note: The database file is automatically created if it does not exist and defaults to files.db.

fileops index path/to/index database_file.db

Hash files in database

fileops hash database_file.db

Cleanup

This removes files that have been deleted, etc. It should be followed by the index command to keep things up to date.

fileops cleanup database_file.db

Calculate Folder Stats

This updates all folders with their sizes. This is a slow process.

fileops folder-stats database_file.db

Find duplicate files by comparing hashes

Use the following query to find the files with the most duplicates

SELECT hash, COUNT(*)
FROM files
WHERE is_directory = 0
GROUP BY hash
ORDER BY COUNT(*) DESC
LIMIT 10;

Then you can do a query per hash

SELECT *
FROM files
WHERE hash = '94bd41953ca5233c5efe121c73959af7';

Tested on

Mac 10.14.3
Windows 10
Ubuntu 18

Project Goals

Minimal dependencies

Feature Ideas

Ignore file settings

hidden files
certain directories

Command to join database files

I might run multiple copies of the program on several drives, or computers, for speed. When they are all done, I want to merge the output database files into one for easy querying.

Calculate Directory size faster

Right now it works, but its rather slow.

Hash Directories

Given two directories on different hard drives, I would like to be able to quickly know if they have the same contents.

If we hash all the files of a directory, can we hash the individual hashes to get a folder hash? Would that work?

Uniquely Identifier Hard Drives

Given a lot of external hard drives, thumb drives, etc, I want to be able to store them all in a single database file and be able to uniquely identify them.

So if I index a thumb drive on my laptop, I want to be able to take it to my PC and update the index there and still know it's for the same thumbdrive.

If this is implemented, the file paths should not have the mount path. The hard drive identifier should be a separate column.

E.g. on windows

e:\projects\python\cli.py

should become

projects\python\cli.py

Performance Improvements

Need to see if we can improve the performance. Yep.

Grand Ideas

Supplementary UI application

Doing this from the CLI is great, but it would be nice to have a UI application that could instantly show you

Largest Files
Largest Duplicate Files
Find files by name (see Everything Search - https://www.voidtools.com/)
Status Updates as the indexing happens (particularly for hashing)
Re-index command
Re-calculate hashes command

One source for all files, with sym-links everywhere else

Once you have all of the files indexed for a drive, store them all in a directory and sym-link them everywhere else. With this, remove all duplicates via hash comparison.

This wouldn't work for auto-generated files of course and may only be useful for relatively static directories.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0.4

May 25, 2019

1.0.3

May 25, 2019

1.0.2

May 25, 2019

1.0.1

May 25, 2019

1.0.0

May 25, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file-ops-1.0.4.tar.gz (13.1 kB view details)

Uploaded May 25, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

file_ops-1.0.4-py3-none-any.whl (30.6 kB view details)

Uploaded May 25, 2019 Python 3

File details

Details for the file file-ops-1.0.4.tar.gz.

File metadata

Download URL: file-ops-1.0.4.tar.gz
Upload date: May 25, 2019
Size: 13.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.2

File hashes

Hashes for file-ops-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`4d9ce2fcf7c7edd096ebf2af6cc37a179b06facdb0fe098071a8fea6facc59a0`
MD5	`a8484217aae17eb75f9382a9b16deb35`
BLAKE2b-256	`1c41cf13499e5b36dc60662eba20c1647e84a7a9f39e90b3c8ade94aad57958d`

See more details on using hashes here.

File details

Details for the file file_ops-1.0.4-py3-none-any.whl.

File metadata

Download URL: file_ops-1.0.4-py3-none-any.whl
Upload date: May 25, 2019
Size: 30.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.2

File hashes

Hashes for file_ops-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`45c75781fd59ddbd10621bfa0bec4cd70bd9d72090dcfca651f237963da25267`
MD5	`46cbad0664c13d0b9d151d21c8b860c1`
BLAKE2b-256	`a2bf59a3a804ab6dac17fef932581b5a8c7a5c191280768f22dd8d152e0ba07b`

See more details on using hashes here.

file-ops 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

File Ops

Commands

Index File Stats (no hash is calculated for files)

Hash files in database

Cleanup

Calculate Folder Stats

Find duplicate files by comparing hashes

Tested on

Project Goals

Feature Ideas

Ignore file settings

Command to join database files

Calculate Directory size faster

Hash Directories

Uniquely Identifier Hard Drives

Performance Improvements

Grand Ideas

Supplementary UI application

One source for all files, with sym-links everywhere else

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes