Skip to main content

Fast index and query of local files.

Project description

File Database

Fast index and query of local files using pandas and pyarrow.

Features

  • Recursively index files in specified directories
  • Regex-based exclusion of file and folder patterns
  • Optional BLAKE2b hashing for deduplication
  • Feather-format output for fast query
  • Simple CLI powered by click

Usage

# Index files
file-db index --config config.yaml

# Query files
file-db query --expr "size > 1e7 and suffix == '.csv'"

Query Language

Query Examples

recent files, top 10, drop dir column but add create column, files with names matching regex "q?md$", which means suffix qmd or md.

verbose recent top 10 select !dir,create ! q?md$

Min config file

included/excluded files must not be empty.

project: Samsung 64GB compact flash
hostname: KOLMOGOROV
database: G:/sams64gb.fdb-feather
included_dirs:
- G:/
excluded_dirs: []
excluded_files: []
excluded_files:
hash_files: true
hash_workers: 6
last_indexed: 0 
timezone: Europe/London
tablefmt: mixed_grid

To Do

  • Util to create a blank config file (maybe with prompts)
  • Incremental updates check for deletes and if the file is logged (changed spec), use set and set diff and df = df[~df["col"].isin(seen)]
  • Create job and python path
  • query > file work
  • add duplicates and hardlinks key works
  • parse queries in order independent manner
  • GT should have a repr_text version!
  • directory sizes?!
  • recent directories (recent changes to dirs)

Regex look aheads/look backs

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_database-0.2.0.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

file_database-0.2.0-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file file_database-0.2.0.tar.gz.

File metadata

  • Download URL: file_database-0.2.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for file_database-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b1ebdbe7f90274e055cf44b732fc5fef79e4f82c852e2d4db6aa3befc52a4c6a
MD5 ac0724d6ae97b1e4190d62727765ea19
BLAKE2b-256 31f4f593ca587eddc78deecbb7a25a36d41a7b49caf77b9d4be783f312d6cf6d

See more details on using hashes here.

File details

Details for the file file_database-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: file_database-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 21.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for file_database-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2199b67319da61e534bf675b650e7bcb98672b4d7d88c59acf2ccc32e1abbf5
MD5 88c5fa604ecbceed76a78e8eb8e66bc4
BLAKE2b-256 2bf7fa2b1c4f3b0f4df1c54e8752292b30a46d954e0b7f575d1ee8f4d7ba3516

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page