Fast index and query of local files.
Project description
File Database
Fast index and query of local files using pandas and pyarrow.
Features
- Recursively index files in specified directories
- Regex-based exclusion of file and folder patterns
- Optional BLAKE2b hashing for deduplication
- Feather-format output for fast query
- Simple CLI powered by
click
Usage
# Index files
file-db index --config config.yaml
# Query files
file-db query --expr "size > 1e7 and suffix == '.csv'"
Query Language
Query Examples
recent files, top 10, drop dir column but add create column, files with names matching regex "q?md$", which means suffix qmd or md.
verbose recent top 10 select !dir,create ! q?md$
Min config file
included/excluded files must not be empty.
project: Samsung 64GB compact flash
hostname: KOLMOGOROV
database: G:/sams64gb.fdb-feather
included_dirs:
- G:/
excluded_dirs: []
excluded_files: []
excluded_files:
hash_files: true
hash_workers: 6
last_indexed: 0
timezone: Europe/London
tablefmt: mixed_grid
To Do
- Util to create a blank config file (maybe with prompts)
- Incremental updates check for deletes and if the file is logged (changed spec), use set and set diff and
df = df[~df["col"].isin(seen)] - Create job and python path
- query > file work
- add duplicates and hardlinks key works
- parse queries in order independent manner
- GT should have a repr_text version!
- directory sizes?!
- recent directories (recent changes to dirs)
Regex look aheads/look backs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
file_database-0.2.0.tar.gz
(18.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file file_database-0.2.0.tar.gz.
File metadata
- Download URL: file_database-0.2.0.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1ebdbe7f90274e055cf44b732fc5fef79e4f82c852e2d4db6aa3befc52a4c6a
|
|
| MD5 |
ac0724d6ae97b1e4190d62727765ea19
|
|
| BLAKE2b-256 |
31f4f593ca587eddc78deecbb7a25a36d41a7b49caf77b9d4be783f312d6cf6d
|
File details
Details for the file file_database-0.2.0-py3-none-any.whl.
File metadata
- Download URL: file_database-0.2.0-py3-none-any.whl
- Upload date:
- Size: 21.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e2199b67319da61e534bf675b650e7bcb98672b4d7d88c59acf2ccc32e1abbf5
|
|
| MD5 |
88c5fa604ecbceed76a78e8eb8e66bc4
|
|
| BLAKE2b-256 |
2bf7fa2b1c4f3b0f4df1c54e8752292b30a46d954e0b7f575d1ee8f4d7ba3516
|