Skip to main content

A Python script to index a large folder structure into a parquet file, along with metadata

Project description

folder-indexer-py

A Python script to index a large folder structure into a parquet file, along with metadata

Description

This script is useful for searching for files stored on a reasonably slow disk from backups, especially in where you aren't sure about the files are are searching for.

Use tools like DBeaver and DuckDB to query and explore the generated index.

Usage

uv tool install folder_indexer

folder_indexer -i /path/to/input/folder -o /path/to/output/folder

Metadata Indexed and Output

The output parquet file (file_index.parquet) has the following columns:

  • file_path
  • folder_path
  • file_name
  • file_size_bytes
  • entry_kind
  • md5_hex
  • sha256_base64
  • date_created
  • date_modified
  • magic_file_type_1
  • first_100_bytes
  • last_100_bytes
  • timestamp_crawled
  • indexing_start_timestamp

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

folder_indexer-0.2.3.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

folder_indexer-0.2.3-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file folder_indexer-0.2.3.tar.gz.

File metadata

  • Download URL: folder_indexer-0.2.3.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.5

File hashes

Hashes for folder_indexer-0.2.3.tar.gz
Algorithm Hash digest
SHA256 2bbc6af9b7fc265c2ca00ebdc65778373eac964aa2205c7e9da8d1149c0a68ce
MD5 f9ec9d9b9bb77f5654b8f5d3808c4c98
BLAKE2b-256 c049acce5c89a93c00c7e65b8e581b1dd8ec31b9aaf1a0a64728151c8c7d01b3

See more details on using hashes here.

File details

Details for the file folder_indexer-0.2.3-py3-none-any.whl.

File metadata

File hashes

Hashes for folder_indexer-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 23e0c9cd247ff9a70751521b959b11a05b6166d03e880b0d06f0c3f2bf1477f5
MD5 5472c3601540b57b79535b9b341c51d5
BLAKE2b-256 bcddcb93d947b6508241f1d46cb1ce93a56b0ca5cb3af4c4190b63b33d8e04b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page