Add your description here
Project description
Overview
Given a set of files, this tool finds the top k according to any given criteria, using an LLM as the judge.
It's intended for small datasets and small k; e.g., finding the top 10 out of a few hundred files. Comparisons are done pairwise: the LLM is given two documents at a time and asked to pick the better one according to the specified criteria. A single-elimination tournament is used to determine the overall "best" file, with additional rounds to determine the runners-up. For a dataset of n files, the tool has to invoke the LLM approximately (n-1) + (k-1)*log_2(n) times.
The supported model APIs are Ollama and Anthropic (Claude).
Current limitations / known issues:
- Only text files are supported (no images/PDFs/etc)
- Models that always output chain-of-thought, e.g. DeepSeek-R1, are not supported
- Improperly formatted model output (which is especially likely to be an issue with small models) results in an unrecoverable error
Usage
First, put all the files you want to rank into one folder. Basic usage of the tool looks like this:
rank-files 'Each document is a book review. The best document is the book review that contains the most thoughtful original content, as opposed to just summarizing or quoting the book.' path/to/input-folder -k 10
However, you'll probably need extra setup based on which model and model provider you want to use.
Ollama
By default the tool will assume you have Ollama locally. You can use a remote Ollama instance by setting the OLLAMA_HOST environment variable to the appropriate URL.
You must have whatever model you want to use installed in Ollama ahead of time. By default the tool tries to use gemma3:4b, which you can install via ollama pull gemma3:4b. However, this model may not be powerful enough for use cases like the one in the example above. You can set the RANK_FILES_MODEL environment variable to use a different model, e.g.:
RANK_FILES_MODEL=llama3.3:70b rank-files 'Each document is a book review. The best document is the book review that contains the most thoughtful original content, as opposed to just summarizing or quoting the book.' path/to/input-folder -k 10
Claude
Alternatively, you can use Claude by setting ANTHROPIC_API_KEY, RANK_FILES_PROVIDER, and RANK_FILES_MODEL. Remember, this costs money and the number of API invocations grows superlinearly; make sure you know what you're doing.
ANTHROPIC_API_KEY=... RANK_FILES_PROVIDER=anthropic RANK_FILES_MODEL='claude-3-5-haiku-latest' rank-files 'Each document is a book review. The best document is the book review that contains the most thoughtful original content, as opposed to just summarizing or quoting the book.' path/to/input-folder -k 10
Caching
You will notice a file named rank-files-cache.sqlite3 created in the current directory when you run the tool. This stores hashes of prompts and the responses received for them, so that the tool won't ask the same model to compare the same two files twice.
This means that if the tool is interrupted, no important work is lost—you can rerun it again with the same parameters (the criteria must be exactly the same) and it will use the cached results for any comparisons that were already performed.
If you want the cache to go somewhere else, set the RANK_FILES_CACHE environment variable to the desired path and filename; or set it to :memory: if you don't want it at all.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rank_files-0.1.0.tar.gz.
File metadata
- Download URL: rank_files-0.1.0.tar.gz
- Upload date:
- Size: 20.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd7d3ded602f869cb5c27a2ef835a8625a9b231791fa93b7bed028b89083cbef
|
|
| MD5 |
107e687f0fd4b7bd84958837f63f7782
|
|
| BLAKE2b-256 |
a49a5cd412a42998d5f57707738ba28f29cfaa63a2c0e6ad3b6529ddac4fb871
|
File details
Details for the file rank_files-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rank_files-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41e1c980059a209218589a8c7bfcf81a7cbff1eff37d641e7fbe388cbbdc8c0c
|
|
| MD5 |
b100c780a66c65b1ecaefda1fd995f4c
|
|
| BLAKE2b-256 |
f5e8c72d3ce69061860bf93d8aa12bdd3cff4e8dca7853fc383d3331ecad8001
|