An efficient multiprocessing directory walk and search tool
Project description
fswalk
An efficient multiprocessing directory walk and search tool
Introduction
fswalk is a simple python script that recursively walks through a filesystem
directory to gather files meta-data and collect them into a json file or
an Elasticsearch database.
It runs several processes, each responsible of doing the list of the files
contained into a subdirectory.
Collected meta-data are filename, path, uid, gid, size
,atime
,ctime
,mtime
and temperature
*.
The output is either a json file sent on the fly to stdout, or an Elastisearch
indexing. A simple search option is provided to retrieve files by their owner,
group or a part of the name.
The script aslo provides an option to do a quick analyze of the resulting output file.
warning: When the results are sent to stdout, due to multiprocessing and not
to slow down the thing, the json file is printed with an extra ,
sign that might
break json compatibility.
The pyjson5
python library allows such non-standard json file to be read.
*: temperature
is a calculated int value from 1 to 7 based on the
max(mtime,atime,ctime). 1 is the coldest (>5 years) and 7 the hottest (< 7 days)
Sample graphs that may be generated with the output produced
Installation
Requirements:
- python >= 3.5
- python packages: requests, pyjson5, elasticsearch
Installing the current stable release:
$ pip install fswalk
Installing the latest devel snapshot:
$ pip install git+https://github.com/bzizou/fs_walk.git
Example
Start a walk into the /home/bzizou
directory with 8 process, excluding
the .snapshot
subdirectory and getting the result as a gzipped json file:
bzizou@f-dahu:~/git/fs_walk$ fswalk -p /home/bzizou -x '^/home/bzizou/\.snapshot/' -n 8 |gzip > /tmp/out.gz
Analyze the output from the resulting file:
bzizou@f-dahu:~/git/fs_walk$ fswalk -a /tmp/out.gz
User Size Count
=================================================================
bzizou 2749804131 11125
root 1030651826 1351
1000 390705282 476
11610 726417 7
Group Size Count
=================================================================
realuser 2749795275 11119
root 1030660332 1356
1000 390705282 476
2222 726417 7
staff 350 1
TOTAL SIZE: 4171887656
TOTAL FILES: 12959
Same directory scan, but we index the results into an Elastisearch database:
bzizou@f-dahu:~/git/fs_walk$ fswalk -p /home/bzizou -x '^/home/bzizou/\.snapshot/' -n 8 --elastic-host=http://localhost:9200 --elastic-index=fs_walk_home -g
Do a search for all files with the "povray" string in their path name and belonging to the user which uid is 10000:
bzizou@f-dahu:~/git/fs_walk$ fswalk --elastic-host=http://localhost:9200 --elastic-index=fs_walk_home --search="10000:*:povray:*"
/home/bzizou/povray/OAR.cigri.14068.1251218.stderr
/home/bzizou/povray/OAR.cigri.14068.1251220.stderr
/home/bzizou/povray/OAR.cigri.14068.1251224.stderr
/home/bzizou/povray/OAR.cigri.14068.1251231.stderr
/home/bzizou/povray/OAR.cigri.14068.1251231.stdout
/home/bzizou/povray/OAR.cigri.14068.1251233.stderr
/home/bzizou/povray/OAR.cigri.14068.1251233.stdout
/home/bzizou/povray/OAR.cigri.14068.1251234.stderr
/home/bzizou/povray/OAR.cigri.14068.1251234.stdout
/home/bzizou/povray/OAR.cigri.14068.1251237.stderr
/home/bzizou/povray/OAR.cigri.14068.1251237.stdout
/home/bzizou/povray/OAR.cigri.14068.1251238.stderr
Usage
Usage: fswalk [options]
Options:
-h, --help show this help message and exit
-p PATH, --path=PATH Path to scan
-n NPROC, --nproc=NPROC
Number of process to launch
-x EXCLUDE_EXPR, --exclude=EXCLUDE_EXPR
Regular expression for path exclusion
-a ANALYZE_FILE, --analyze=ANALYZE_FILE
Creates a summary based on a previously generated json
file
-s SEARCH_STRING, --search=SEARCH_STRING
Search a subset of files with syntax:
[uid]:[gid]:[path_glob]:[hostname] (--analyze or
--elastic-host needed)
--numeric Output numeric uid/gid instead of names
--hostname=HOSTNAME Overwrite the value of the hostname string. Defaults
to local hostname.
-e ELASTIC_HOST, --elastic-host=ELASTIC_HOST
Use an elasticsearch server for output. 'Ex:
http://localhost:9200'
-P HTAUTH, --http-credentials=HTAUTH
File containing http credentials for elasticsearch if
necessary. Syntax: <user>:<passwd>
--elastic-index=ELASTIC_INDEX
Name of the elasticsearch index
--elastic-bulk-size=MAX_BULK_SIZE
Size of the elastic indexing bulks
-g, --elastic-purge-index
Purge the elasticsearch index before indexing
--no-check-certificate
Don't check certificates files when using SSL
The ANALYZE_FILE
parameter may be a gzip compressed json file or a plain-text json file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fswalk-1.3.11.tar.gz
.
File metadata
- Download URL: fswalk-1.3.11.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.5.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5332903e51f68aa56306d84dfb456c927f1c86a9cc833d5c7f195fbc7b09fe82 |
|
MD5 | ded18935911cce555b67703b908e19df |
|
BLAKE2b-256 | b003c21c5758f4854e5e21ba262519c55a51c8dffaec3d3e679cd7d0e55cfa2f |
File details
Details for the file fswalk-1.3.11-py2.py3-none-any.whl
.
File metadata
- Download URL: fswalk-1.3.11-py2.py3-none-any.whl
- Upload date:
- Size: 20.7 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.5.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d204abf5f40b247f54535a594e75ab8beffb8f1f88836ee1e7ea7fd69f598df |
|
MD5 | 839d08fc71989912bb97821ea316ff68 |
|
BLAKE2b-256 | 7587cc450288650d8f3521d83c6b239d83991adc81eba025750dfdc273e33aa3 |