Persistent memory layer for LLM agents. Zero LLM calls, sub-100ms ingestion, deterministic extraction.
Project description
Fraction
Persistent memory layer for LLM agents and AI applications. Zero API costs, sub-100ms ingestion, fully offline.
Fraction supports two extraction modes:
- LLMLingua-2 (default) — learned token compression, zero API cost, fully offline
- LLM extraction — any LLM provider via litellm (OpenAI, Anthropic, Ollama, etc.)
Both modes use the same hybrid retrieval layer: vector similarity + BM25 + entity graph + temporal boost, merged via Reciprocal Rank Fusion.
Installation
pip install fractionally
# Download the spaCy model for entity extraction
python -m spacy download en_core_web_sm
# Optional: install litellm for LLM-based extraction (supports any provider)
pip install fractionally[llm]
Quick Start
from fraction import Memory
m = Memory()
# Add memories
m.add("I love hiking in the Rocky Mountains.", user_id="alice")
m.add("My favorite book is Dune by Frank Herbert.", user_id="alice")
m.add("I'm allergic to peanuts.", user_id="alice")
# Search memories
results = m.search("outdoor activities", user_id="alice")
for r in results["results"]:
print(f"{r['memory']} (score: {r['score']:.3f})")
# Memories auto-persist to ~/.fraction/
Features
- Two extraction modes — LLMLingua-2 (free, offline) or LLM-based (any provider via litellm)
- Zero API cost (default mode) — compression + embedding + retrieval run locally
- Sub-100ms ingestion — LLMLingua-2 compression + USearch indexing
- Deterministic — same input always produces same memory (LLMLingua mode)
- Hybrid retrieval — vector similarity + BM25 keywords + entity graph, merged via Reciprocal Rank Fusion
- Auto-persistence — memories survive process restarts
- Scoping — isolate memories by
user_id,agent_id, orrun_id
API
Memory (recommended)
High-level client with automatic persistence.
from fraction import Memory
# Default storage: ~/.fraction/
m = Memory()
# Custom storage directory
m = Memory(data_dir="./my_project_memory")
# With custom config
from fraction import FractionConfig
m = Memory(config=FractionConfig(compression_rate=0.5, top_k=5))
# With LLM-based extraction (any litellm-supported provider)
m = Memory(config=FractionConfig(compressor_type="llm", llm_model="gpt-4o-mini"))
# Use Anthropic, Ollama, or any other provider
m = Memory(config=FractionConfig(compressor_type="llm", llm_model="anthropic/claude-sonnet-4-20250514"))
m = Memory(config=FractionConfig(compressor_type="llm", llm_model="ollama/llama3"))
# Context manager
with Memory(data_dir="./temp") as m:
m.add("some fact", user_id="u1")
Write Operations
# Add memory from text
result = m.add("I moved to Berlin in 2023.", user_id="alice")
# {"results": [{"id": "a1b2c3", "memory": "moved Berlin 2023.", "event": "ADD"}]}
# Add from conversation messages
m.add([
{"role": "user", "content": "I just got a golden retriever!"},
{"role": "assistant", "content": "That's great! What's their name?"},
{"role": "user", "content": "His name is Oliver."},
], user_id="alice")
# Update a memory
m.update(memory_id, "I moved to Munich in 2024.")
# Delete
m.delete(memory_id)
m.delete_all(user_id="alice")
Read Operations
# Search with hybrid retrieval
results = m.search("where does alice live?", user_id="alice", limit=5)
# Get a specific memory
memory = m.get(memory_id)
# List all memories for a user
all_memories = m.get_all(user_id="alice")
# View change history
history = m.history(memory_id)
Fraction (low-level)
Direct access to the compression + retrieval pipeline. Use this for benchmarks or when you need manual control over persistence.
from fraction import Fraction, FractionConfig
config = FractionConfig(
vector_store_path="./my_index.usearch",
metadata_path="./my_meta.json",
compression_rate=0.6,
)
f = Fraction(config)
f.add("some text", user_id="alice")
results = f.search("query", user_id="alice")
f.save() # manual persistence
f.load() # manual loading
How It Works
Write Path
text → LLMLingua-2 compress → spaCy NER → embed (BGE) → USearch index + entity graph
- Token compression — LLMLingua-2 (BERT-sized, ~110M params) scores token importance and retains the top 60%
- Entity extraction — spaCy NER extracts named entities (people, places, orgs) without LLM calls
- Embedding — Sentence-Transformers (BGE-base) generates 768-dim vectors locally
- Indexing — USearch HNSW index for fast approximate nearest neighbor search
- Relevance gate — filler turns with no entities and low content are automatically skipped
Read Path
query → embed → vector search + BM25 + graph traversal + temporal boost → RRF rerank → results
Four retrieval signals merged via Reciprocal Rank Fusion:
- Vector similarity — semantic matching via USearch
- BM25 keywords — exact term matching on raw text
- Entity graph — multi-hop traversal through entity relationships
- Temporal boost — date-aware scoring for time-based queries
Benchmarks
Evaluated on LoCoMo (1540 questions across 10 multi-session conversations):
LLMLingua-2 mode (no LLM, zero API cost)
| Metric | Fraction | mem0 | supermemory |
|---|---|---|---|
| BLEU-1 | 0.41 | ~0.35 | ~0.38 |
| F1 | 0.44 | ~0.40 | ~0.45 |
| LLM Judge (1-5) | 3.66 | ~3.2 | ~3.5 |
| LLM Judge (0/1) | 0.62 | ~0.669 | — |
| add() latency (p50) | 449ms | 708ms | — |
| search() latency (p50) | 160ms | ~200ms | — |
| API cost (memory ops) | $0 | per-call | per-call |
LLM extraction mode (using gpt-4o-mini)
| Metric | Fraction (LLM) | Fraction (LLMLingua) |
|---|---|---|
| BLEU-1 | 0.40 | 0.41 |
| F1 | 0.43 | 0.44 |
| LLM Judge (1-5) | 3.60 | 3.66 |
| LLM Judge (0/1) | 0.61 | 0.62 |
| add() latency (p50) | 1385ms | 449ms |
| search() latency (p50) | 160ms | 160ms |
Configuration
All options with defaults:
FractionConfig(
# Compression
compressor_type="llmlingua2", # "llmlingua2" | "self_info" | "ensemble" | "llm"
compression_rate=0.6, # retain 60% of tokens
adaptive_compression=True, # skip compression for very short texts
# LLM extraction (when compressor_type="llm") — uses litellm
llm_model="gpt-4o-mini", # any litellm model string
llm_api_key=None, # falls back to provider env vars
llm_api_base=None, # custom API base (for self-hosted/ollama)
# Relevance gate
relevance_gate=True, # skip filler turns
min_content_words=3, # minimum content words to store
# Embedder
embedder_model="BAAI/bge-base-en-v1.5",
# Retrieval
top_k=10, # default results per search
use_bm25=True, # enable keyword search
use_graph=True, # enable entity graph traversal
rerank=True, # enable RRF reranking
duplicate_threshold=0.95, # cosine similarity for dedup
)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fractionally-0.1.1.tar.gz.
File metadata
- Download URL: fractionally-0.1.1.tar.gz
- Upload date:
- Size: 28.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fba713714dd1c4a84102257a344a4390b2188093c146aaf19b588a255e393532
|
|
| MD5 |
30ae76d428350a63e3f47bf9c6adcddf
|
|
| BLAKE2b-256 |
0ed1739fe8d412ddf765d2693d2fdfab9c3d23714ae36d6797d667173570f9e6
|
File details
Details for the file fractionally-0.1.1-py3-none-any.whl.
File metadata
- Download URL: fractionally-0.1.1-py3-none-any.whl
- Upload date:
- Size: 29.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e804c839ad1610df4155de63963f845fedc4053c5434442eaafc887c45c30676
|
|
| MD5 |
876794eb636e9bb742e5d3ce8003ca33
|
|
| BLAKE2b-256 |
4e916ce2d0fcadf00c0611ea0a394f3df9b5e07cb5d7025f406d15ae215305d0
|