Match recall segments with story segments.
Project description
rMatch
Automatic recall & story matching tool.
Quick start
Command line
pip install rmatch
# single recall file
rmatch story.txt recall.txt --matcher anthropic
# directory of recall files (one per subject)
rmatch story.txt recalls/ --matcher anthropic
# estimate API cost without sending requests
rmatch story.txt recalls/ --matcher openai --model gpt-5.4 --dry-run
Python API
from rmatch import Matcher
matcher = Matcher(matcher_name="anthropic", api_key="your_api_key", model_name="claude-haiku-4-5")
matches = matcher.match(
story_segments=["The cat sat on the mat.", "It purred softly."],
recall_segments=["A cat was on a mat."],
)
# [(0, [0])] — recall segment 0 matched story segment 0
Or use run_matching to load files, run matching, and save results in one call:
from rmatch.match import run_matching
results = run_matching(
story_file="story.txt",
recall_file="recalls/",
matcher_name="anthropic",
api_key="your_api_key",
)
Setup API keys
API keys are resolved in this order (first match wins):
api_keyargument passed directly in Python.envfile in the current working directory- Environment variables already set in your shell
Set them as environment variables:
export ANTHROPIC_API_KEY="your_api_key" # for --matcher anthropic (default)
export OPENAI_API_KEY="your_api_key" # for --matcher openai
export HF_TOKEN="your_hf_token" # for --matcher huggingface
Or put a .env file in your working directory:
ANTHROPIC_API_KEY="your_api_key"
OPENAI_API_KEY="your_api_key"
HF_TOKEN="your_hf_token"
Output format
A JSON file with:
{
"matcher_name": "anthropic",
"story_name": "story",
"story_segmentation": "lines",
"recall_segmentation": "lines",
"matches": {
"sub-001": [[0, [3, 7]], [1, [12]]],
"sub-002": [[0, [1]], [1, [5, 6]]]
}
}
Each entry in matches maps a subject ID to a list of [recall_segment_id, [matched_story_segment_ids...]] pairs.
Running local models
You can run any model capable of text-generation from https://huggingface.co/models.
To speed up inference, you can install flash-att
MAX_JOBS=8 uv pip install flash-attn --no-build-isolation.
Note that some models may not support flash-attn.
If the model flash-attn is not automatically disabled for these models, pass the flag --no-flash-attn.
Benchmarking
Requires rBench:
# outside of this dir
git clone git@github.com:GabrielKP/rBench.git
Add to .env or environment:
BENCHMARK_ROOT="path/to/rBench"
Run:
uv run src/rmatch/evaluate.py {alice,monthiversary,memsearch}
API / Documentation
Input formats
Story file — a .txt or .json file containing the story segments to match against.
.txt: one segment per line (blank lines are ignored)..json: must contain a"segments"array of strings. Optionally includes"segmentation_method".
{
"segmentation_method": "sentences",
"segments": [
"The cat sat on the mat.",
"It purred softly."
]
}
Recall file — a .txt file, a .json file, or a directory of either.
.txtfile: one recall segment per line. The filename stem is used as the subject ID..jsonfile: must contain a"recalls"object mapping subject IDs to segment arrays.- Directory: all
.txtor all.jsonfiles inside are loaded (mixing formats is not allowed). Each.txtfile becomes one subject;.jsonfiles are merged.
{
"segmentation_method": "clauses",
"recalls": {
"sub-001": ["A cat was on a mat.", "It was purring."],
"sub-002": ["There was a cat on something."]
}
}
CLI reference
rmatch STORY_FILE RECALL_FILE [options]
General options
STORY_FILE(positional, required) — Path to the story.txtor.jsonfile.RECALL_FILE(positional, required) — Path to a recall.txt/.jsonfile or a directory of them.-M,--matcher(str) — Which matcher backend to use. One of:anthropic,openai,reranker,huggingface. Default:anthropic.-m,--model-name(str) — Override the matcher's default model (see defaults below).-f,--overwrite— Overwrite the output file if it already exists.
LLM matcher options (anthropic, openai, huggingface)
--window-size(int) — Number of surrounding recall segments (before and after) to include as context for each target segment. Set to0to disable context. Default:5.--prompt(str) — Prompt type. Default:primary. See Prompts.--dry-run— anthropic & openai only. Estimate token usage and cost without making API calls.
Self-hosted / HuggingFace options
-q,--quantization(str) — Load the model in reduced precision:4bit(NF4) or8bit. Requiresbitsandbytes.-bs,--batch-size(int) — Number of prompts to process in parallel. Default:4.--max-new-tokens(int) — Maximum tokens the model may generate per prompt. Default:64.--verbose-errors— Print the raw model output when parsing fails. Useful for debugging prompt issues.
Reranker options
--device(str) — PyTorch device for the reranker model (e.g.cpu,cuda,mps). Default: auto.--threshold(float) — Minimum similarity score for a story segment to be considered a match. Default:0.09.--top-k(int) — Number of top-scoring story candidates to evaluate per recall segment. Default:5.
Default models
- anthropic —
claude-opus-4-6 - openai —
gpt-4.1 - reranker —
BAAI/bge-reranker-v2-m3 - huggingface —
meta-llama/Llama-3.2-1B-Instruct
Python API
Matcher (main entry point)
from rmatch import Matcher
matcher = Matcher(matcher_name="anthropic", model_name=None, **kwargs)
matches = matcher.match(story_segments, recall_segments)
Matcher(matcher_name, **kwargs) is a factory — it returns the appropriate subclass based on matcher_name. All keyword arguments are forwarded to the subclass constructor.
Constructor arguments:
model_name(str) — Override the default model. Applies to all matchers.window_size(int) — Context window radius around the target recall segment. Default:5. Applies to:anthropic,openai,huggingface.prompt_type(str) — Prompt type. Default:"primary". Applies to:anthropic,openai,huggingface. See Prompts.dry_run(bool) — Estimate cost without calling the API. Applies to:anthropic,openai.api_key(str) — API key. Falls back to.env, then environment variables. Applies to:anthropic,openai,huggingface.device(str) — PyTorch device string. Applies to:reranker.threshold(float) — Score threshold for matches. Default:0.09. Applies to:reranker.top_k(int) — Top-k candidates per recall segment. Default:5. Applies to:reranker.quantization(str) —"4bit"or"8bit". Applies to:huggingface.batch_size(int) — Batch size for inference. Default:4. Applies to:huggingface.max_new_tokens(int) — Max generated tokens. Default:64. Applies to:huggingface.verbose_errors(bool) — Log raw output on parse failures. Applies to:huggingface.
matcher.match(story_segments, recall_segments)
story_segments(list[str]) — Ordered list of story segments (the ground-truth story elements).recall_segments(list[str]) — Ordered list of a single participant's recall segments.
Returns list[tuple[int, list[int]]] — one entry per recall segment:
[
(0, [2, 5]), # recall segment 0 matched story segments 2 and 5
(1, []), # recall segment 1 had no matches
(2, [0]), # recall segment 2 matched story segment 0
]
run_matching (file-level convenience)
from rmatch.match import run_matching
results = run_matching(
story_file, # Path — story .txt or .json
recall_file, # Path — recall file or directory
matcher_name, # str — "anthropic", "openai", "reranker", "huggingface"
story_name=None, # str | None — override auto-detected story name
story_segmentation=None, # str | None — override detected segmentation method
recall_segmentation=None, # str | None — override detected segmentation method
overwrite=False, # bool — overwrite existing output file
**kwargs, # forwarded to the Matcher constructor (model_name, window_size, etc.)
)
Loads story and recall files, runs matching for every subject, and saves a JSON results file. Returns the output dictionary.
Prompts
All LLM matchers share the same set of prompt templates. The default is primary.
| Prompt | Full story | Segmented story | Chain of thought | Notes |
|---|---|---|---|---|
primary |
yes | yes | yes | Default; most complete prompt. |
primary_no_story |
no | yes | yes | Useful for long stories where the full text would exceed the context window. |
primary_no_cot |
yes | yes | no | Ablation: removes chain-of-thought reasoning. |
primary_no_story_no_cot |
no | yes | no | Ablation: minimal prompt with only segments and recall window. |
secondary |
yes | yes | yes | Alternative prompt wording with XML-structured output. |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rmatch-0.3.0.tar.gz.
File metadata
- Download URL: rmatch-0.3.0.tar.gz
- Upload date:
- Size: 159.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
87125c8025d0d1515318c4d40226cc59c65b067740f010eadcbd61e5e450de25
|
|
| MD5 |
2006d3f924b7950c88a62454476db6f1
|
|
| BLAKE2b-256 |
0ab895f72abfc333cedd58845645cd12ab9f6cd7cabe3ce15c9730b63952fa9c
|
Provenance
The following attestation bundles were made for rmatch-0.3.0.tar.gz:
Publisher:
publish.yml on GabrielKP/rMatch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rmatch-0.3.0.tar.gz -
Subject digest:
87125c8025d0d1515318c4d40226cc59c65b067740f010eadcbd61e5e450de25 - Sigstore transparency entry: 1269845024
- Sigstore integration time:
-
Permalink:
GabrielKP/rMatch@21f41c7edf8ab756fe252df2b1e57b82487d0202 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/GabrielKP
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@21f41c7edf8ab756fe252df2b1e57b82487d0202 -
Trigger Event:
push
-
Statement type:
File details
Details for the file rmatch-0.3.0-py3-none-any.whl.
File metadata
- Download URL: rmatch-0.3.0-py3-none-any.whl
- Upload date:
- Size: 29.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
141ff146a8b4f5a1c031e06ff291d909fb27ea6f6f000e280d88ea6388342850
|
|
| MD5 |
2d57c0c92bf61e9bfce76629906da885
|
|
| BLAKE2b-256 |
659d74985c0137755080acfb0d5642c2322262655398ce4326508d2faf1f95f2
|
Provenance
The following attestation bundles were made for rmatch-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on GabrielKP/rMatch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rmatch-0.3.0-py3-none-any.whl -
Subject digest:
141ff146a8b4f5a1c031e06ff291d909fb27ea6f6f000e280d88ea6388342850 - Sigstore transparency entry: 1269845104
- Sigstore integration time:
-
Permalink:
GabrielKP/rMatch@21f41c7edf8ab756fe252df2b1e57b82487d0202 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/GabrielKP
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@21f41c7edf8ab756fe252df2b1e57b82487d0202 -
Trigger Event:
push
-
Statement type: