AI-powered intelligent file organizer — find duplicates, track versions, identify the real final draft

These details have not been verified by PyPI

Project links

repository

Project description

FileWise

AI-powered intelligent file organizer — find semantically similar files, trace document version evolution, and identify the "real final draft."

Unlike hash-based deduplication tools (czkawka, fdupes, rdfind), FileWise uses embedding similarity to recognize different versions of the same document, even when content has been edited, renamed, or scattered across directories.

Fully local — the embedding model runs on your machine. No data ever leaves your disk.

Quick Start

git clone https://github.com/Maobuchiyugutou/FileWise.git
cd FileWise
pip install -e ".[all]"

# Scan a directory
filewise scan ~/Documents

# AI-powered analysis — find similar files and version chains
filewise analyze ~/Documents

# Compare two files
filewise diff proposal_v1.md proposal_v2.md

# Smart rename — add version prefixes based on analysis
filewise rename ~/Documents            # dry-run
filewise rename ~/Documents --apply    # apply renames

# Natural language search — find files by describing what you want
filewise search "budget proposal" ~/Documents

# Find files similar to a specific file
filewise find-similar draft.md

Commands

Command	Description
`filewise scan <dir>`	List files by format, show supported/unsupported counts
`filewise analyze <dir>`	Full AI pipeline: find similar files and version chains
`filewise diff <A> <B>`	Line-level content comparison between two files
`filewise rename <dir>`	Rename files to show version order (`--apply` to execute)
`filewise search <query> <dir>`	Natural language search with auto mode detection
`filewise find-similar <file>`	Find files semantically similar to a given file
`filewise evaluate <dir>`	Run algorithm accuracy tests against ground-truth scenarios
`filewise info`	System info and supported formats

How It Works

Scan files → Extract text → Split into chunks → Generate embeddings
    → Cluster by similarity → Infer version chains → Display results

Version Chain Algorithm

Three-stage, multi-signal approach:

Clustering (DBSCAN + hierarchical refinement) — group files by content similarity only, ignoring file names
Ordering — determine version direction using:
- Content containment (primary): how much of A appears in B?
- Filename dates: extract 2025-04-17 from filenames
- Version patterns: v1 → v2, draft → final, 第1版 → 第2版
- Modification time (secondary)
Chain construction — topological sort with confidence tiers (HIGH / MEDIUM / LOW)

Special cases handled: very short files (substring matching), heavily rewritten documents (filename signal boost), format variants (same content, different extensions).

Evaluation

21 scenarios test the algorithm across typical edge cases (100% accuracy):

filewise evaluate tests/eval_scenarios
# 17 version chain scenarios (100%) + 4 search scenarios (R@5=100%)

Supported Formats

Category	Extensions
Documents	`.pdf`, `.docx`, `.doc`, `.odt`
Text	`.txt`, `.md`, `.markdown`, `.rst`, `.log`
Code	`.py`, `.js`, `.ts`, `.go`, `.rs`, `.java`, `.c`, `.cpp`, `.h`, `.sh`, `.sql`
Config/Data	`.json`, `.yaml`, `.yml`, `.toml`, `.xml`, `.csv`, `.tsv`
Web	`.html`, `.css`

Tech Stack

Layer	Choice
Embedding	`sentence-transformers` + `BAAI/bge-small-zh-v1.5` (Chinese/English)
Vector Store	`ChromaDB` (persistent, incremental)
Clustering	`scikit-learn` (DBSCAN + hierarchical refinement)
Document Parsing	`python-docx`, `PyPDF2` (with text cache)
CLI	`typer` + `rich`
CI	GitHub Actions (pytest + ruff on every push)

Roadmap

File scanner
Multi-format document parser
Text chunking (paragraph-first)
Embedding generation
Vector storage (ChromaDB, persistent)
Similarity clustering (DBSCAN + hierarchical)
Version chain inference (multi-signal scoring)
Content diff
Format variant detection (same-name, different extension)
Smart rename (version-aware file renaming)
Natural language search (semantic + keyword hybrid)
File-anchored similarity search (find-similar)
Evaluation framework (18 scenarios, 100% accuracy)
CI/CD pipeline (GitHub Actions)
Incremental indexing (watchdog — auto-detect file changes)
TUI interface (Textual, Yazi-like)
PyPI package (pip install filewise)

Requirements

Python 3.10+
~100MB disk for embedding model (downloaded on first use, cached locally)

Project details

These details have not been verified by PyPI

Project links

repository

Release history Release notifications | RSS feed

This version

0.1.0

May 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

filewise_ai-0.1.0.tar.gz (56.0 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

filewise_ai-0.1.0-py3-none-any.whl (39.5 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file filewise_ai-0.1.0.tar.gz.

File metadata

Download URL: filewise_ai-0.1.0.tar.gz
Upload date: May 8, 2026
Size: 56.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for filewise_ai-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`59ea2bb53b64d19dcef968442a793e6d12020088fe058e602230dc80cba082ab`
MD5	`b9925446d4399ead94ce2a5a26ec69ab`
BLAKE2b-256	`4b7563ea29b84d405d7c621564edcfbf9f4fc8a1bafc1e312b7d4627d6c28dfa`

See more details on using hashes here.

File details

Details for the file filewise_ai-0.1.0-py3-none-any.whl.

File metadata

Download URL: filewise_ai-0.1.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 39.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for filewise_ai-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3aa4b1fa123de4becf9706d90bdedb7f67c5d3108150cfcb474d4d09036a82c2`
MD5	`e9c9d9b7016318b0e882552a1c0c7d9c`
BLAKE2b-256	`64f82af1e3e8a9414cf687e09c3e15210107b3231ca69cb7a213eb9bb75bb922`

See more details on using hashes here.

filewise-ai 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FileWise

Quick Start

Commands

How It Works

Version Chain Algorithm

Evaluation

Supported Formats

Tech Stack

Roadmap

Requirements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes