Directory tree metadata parser using Apache Tika
Project description
tikatree
Directory tree metadata parser using Apache Tika
tikatree parses all files in a directory and creates a:
- _metadata.json - A single file with the metdata from each file that was parsed
- _file_tree.json and _file_tree.csv - A list of all files and directories with some basic information. One file that's easy to read and another for importing into excel and things like that
- _directory_tree.txt - A graphical representation of the directory
- .sfv - A CRC32 checksum
Installation
pip install tikatree
tikatree uses tika-python for interacting with Apache Tika. You may need to refer to the tika-python documentation if you have any issues with Tika.
Usage
Open up a command line and type tikatree <directory>
, by default it'll create all files at or above that directory. You can target multiple directories, just put a space in between each one on the command line.
usage: tikatree [-h] [-v] [-d] [-e EXCLUDE [EXCLUDE ...]] [-f] [-m] [-s] [-y] DIRECTORY [DIRECTORY ...]
A directory tree metadata parser using Apache Tika, by default it runs arguments: -d, -f, -m, -s
positional arguments:
DIRECTORY directory(s) to parse
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
-d, --directorytree create directory tree
-e EXCLUDE [EXCLUDE ...], --exclude EXCLUDE [EXCLUDE ...]
directory(s) to exclude, includes subdirectories
-f, --filetree creates a json and csv file tree
-m, --metadata parse metadata
-s, --sfv create sfv file
-y, --yes automatically overwrite older files
Example
I've included some output examples in the output_examples
folder.
Windows Fixes
When parsing files too fast there can be connection errors to Apache Tika. In order to get around this run these commands in Powershell as Admin
$KeyPath = "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"
Set-ItemProperty -Path $KeyPath -Name "MaxUserPort" -Value 65534
$KeyPath = "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"
Set-ItemProperty -Path $KeyPath -Name "TcpTimedWaitDelay" -Value 30
Part of the Keep Dreaming Project
Main Repository
Project
GitHub Mirror
Contributing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tikatree-0.0.10-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9997f1ec5127fefce6399bf7633d41f39202566440fca949833a835545996b60 |
|
MD5 | f3da3db597fcbcc5ef298d0ecb6588df |
|
BLAKE2b-256 | 359a488cb8de4d4207d3bfc62d8cb20c6632e4a9a3399c0be653d66d6cd4057b |