Skip to main content

Add your description here

Project description

Overview

Given a set of files, this tool finds the top k according to any given criteria, using an LLM as the judge.

It's intended for small datasets and small k; e.g., finding the top 10 out of a few hundred files. Comparisons are done pairwise: the LLM is given two documents at a time and asked to pick the better one according to the specified criteria. A single-elimination tournament is used to determine the overall "best" file, with additional rounds to determine the runners-up. For a dataset of n files, the tool has to invoke the LLM approximately (n-1) + (k-1)*log_2(n) times.

The supported model APIs are Ollama and Anthropic (Claude).

Current limitations / known issues:

  • Only text files are supported (no images/PDFs/etc)
  • Models that always output chain-of-thought, e.g. DeepSeek-R1, are not supported
  • Improperly formatted model output (which is especially likely to be an issue with small models) results in an unrecoverable error

Usage

First, put all the files you want to rank into one folder. Basic usage of the tool looks like this:

rank-files 'Each document is a book review. The best document is the book review that contains the most thoughtful original content, as opposed to just summarizing or quoting the book.' path/to/input-folder -k 10

However, you'll probably need extra setup based on which model and model provider you want to use.

Ollama

By default the tool will assume you have Ollama locally. You can use a remote Ollama instance by setting the OLLAMA_HOST environment variable to the appropriate URL.

You must have whatever model you want to use installed in Ollama ahead of time. By default the tool tries to use gemma3:4b, which you can install via ollama pull gemma3:4b. However, this model may not be powerful enough for use cases like the one in the example above. You can set the RANK_FILES_MODEL environment variable to use a different model, e.g.:

RANK_FILES_MODEL=llama3.3:70b rank-files 'Each document is a book review. The best document is the book review that contains the most thoughtful original content, as opposed to just summarizing or quoting the book.' path/to/input-folder -k 10

Claude

Alternatively, you can use Claude by setting ANTHROPIC_API_KEY, RANK_FILES_PROVIDER, and RANK_FILES_MODEL. Remember, this costs money and the number of API invocations grows superlinearly; make sure you know what you're doing.

ANTHROPIC_API_KEY=... RANK_FILES_PROVIDER=anthropic RANK_FILES_MODEL='claude-3-5-haiku-latest' rank-files 'Each document is a book review. The best document is the book review that contains the most thoughtful original content, as opposed to just summarizing or quoting the book.' path/to/input-folder -k 10

Caching

You will notice a file named rank-files-cache.sqlite3 created in the current directory when you run the tool. This stores hashes of prompts and the responses received for them, so that the tool won't ask the same model to compare the same two files twice.

This means that if the tool is interrupted, no important work is lost—you can rerun it again with the same parameters (the criteria must be exactly the same) and it will use the cached results for any comparisons that were already performed.

If you want the cache to go somewhere else, set the RANK_FILES_CACHE environment variable to the desired path and filename; or set it to :memory: if you don't want it at all.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rank_files-0.1.0.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rank_files-0.1.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file rank_files-0.1.0.tar.gz.

File metadata

  • Download URL: rank_files-0.1.0.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.9

File hashes

Hashes for rank_files-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bd7d3ded602f869cb5c27a2ef835a8625a9b231791fa93b7bed028b89083cbef
MD5 107e687f0fd4b7bd84958837f63f7782
BLAKE2b-256 a49a5cd412a42998d5f57707738ba28f29cfaa63a2c0e6ad3b6529ddac4fb871

See more details on using hashes here.

File details

Details for the file rank_files-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rank_files-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.9

File hashes

Hashes for rank_files-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41e1c980059a209218589a8c7bfcf81a7cbff1eff37d641e7fbe388cbbdc8c0c
MD5 b100c780a66c65b1ecaefda1fd995f4c
BLAKE2b-256 f5e8c72d3ce69061860bf93d8aa12bdd3cff4e8dca7853fc383d3331ecad8001

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page