Skip to main content

Directory tree metadata parser using Apache Tika

Project description

tikatree

Directory tree metadata parser using Apache Tika

tikatree parses all files in a directory and creates a:

  • _metadata.json - A single file with the metdata from each file that was parsed
  • _file_tree.json and _file_tree.csv - A list of all files and directories with some basic information. One file that's easy to read and another for importing into excel and things like that
  • _directory_tree.txt - A graphical representation of the directory
  • .sfv - A CRC32 checksum

Installation

pip install tikatree

tikatree uses tika-python for interacting with Apache Tika. You may need to refer to the tika-python documentation if you have any issues with Tika.

Usage

Open up a command line and type tikatree <directory>, by default it'll create all files at or above that directory. You can target multiple directories, just put a space in between each one on the command line.

usage: tikatree [-h] [-v] [-d] [-e EXCLUDE [EXCLUDE ...]] [-f] [-k] [-m] [-nm] [-s] [-y] DIRECTORY [DIRECTORY ...]

A directory tree metadata parser using Apache Tika, by default it runs arguments: -d, -f, -m, -s

positional arguments:
  DIRECTORY             directory(s) to parse

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  -d, --directorytree   create directory tree
  -e EXCLUDE [EXCLUDE ...], --exclude EXCLUDE [EXCLUDE ...]
                        directory(s) to exclude, includes subdirectories
  -f, --filetree        creates a json and csv file tree
  -k, --kill            kill Tika process after each directory parsed
  -m, --metadata        parse metadata
  -nm, --newmetadata    create individual metadata files in a 'tikatree' directory
  -s, --sfv             create sfv file
  -y, --yes             automatically overwrite older files

Example

I've included some output examples in the output_examples folder.

Windows Fixes

When parsing files too fast there can be connection errors to Apache Tika. In order to get around this run these commands in Powershell as Admin

$KeyPath = "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"
Set-ItemProperty -Path $KeyPath -Name "MaxUserPort" -Value 65534
$KeyPath = "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"
Set-ItemProperty -Path $KeyPath -Name "TcpTimedWaitDelay" -Value 30
$KeyPath = "HKLM:\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters"
Set-ItemProperty -Path $KeyPath -Name "StrictTimeWaitSeqCheck" -Value 1

Part of the Keep Dreaming Project

Main Repository

Project

GitHub Mirror

Contributing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tikatree-0.1.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

tikatree-0.1.1-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file tikatree-0.1.1.tar.gz.

File metadata

  • Download URL: tikatree-0.1.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.6

File hashes

Hashes for tikatree-0.1.1.tar.gz
Algorithm Hash digest
SHA256 566bf2784aa0a44b2c420fc51748cb3831f3c1f3fb3da366586d712b5ce1480d
MD5 5963d3111dd7ccc65e529da7aa1cdbc4
BLAKE2b-256 fcde9ac32e55b8d2a905427f4578860230f1a0dc9808cff65210cb1c7818d00e

See more details on using hashes here.

File details

Details for the file tikatree-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tikatree-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.6

File hashes

Hashes for tikatree-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a998d71141720479fba0208e8ffff046644274a6c5db9f1b22806b13b6d425b8
MD5 3bd7b1195c07f173aba6b66d5c1c3ec6
BLAKE2b-256 de157dfb84d4c145e64cd38f9725941c417121af2ffdcc1fb51c35d32535812e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page