A probabilistic data structure for time-decayed popularity tracking
Project description
HeatMap Sketch
A probabilistic data structure for time-decayed popularity.
Paper forthcoming on arXiv — Full theoretical analysis and experimental validation of the HeatMap sketch data structure, including decay accuracy bounds, space-error tradeoffs, and comparisons with Count-Min Sketch, Space-Saving, and LFU cache.
The Problem
You need to track what's trending right now, not what was popular historically.
- Counting gives you total frequency (all-time popular)
- Bloom filters tell you if something exists
- LRU tells you what was accessed recently
None of these tell you "what's hot NOW".
The Solution: HeatMap
HeatMap tracks time-decayed popularity — items have "heat" that automatically cools over time.
from heatmap_sketch import HeatMap
# Create a HeatMap with 1-minute half-life
hm = HeatMap(half_life_seconds=60.0)
# Add heat to items
hm.add("#python")
hm.add("#ai", amount=10.0)
# Check current heat
print(hm.check("#ai")) # 10.0
# Get top trending
for item, heat in hm.top(10):
print(f"{item}: {heat}")
# Clean up
hm.stop()
How It Works
HeatMap implements exponential decay as a differential equation:
dH/dt = -λ·H + δ(item)
Every contribution has a half-life. Old contributions fade. New ones stay hot.
Key properties:
- O(k) add and check operations (k = number of hash functions)
- O(m) space (m = number of slots, typically ~1M)
- Probabilistic — may have false positives, but no false negatives
- Thread-safe — can be used from multiple threads
- Auto-decay — background thread handles time decay
Installation
pip install heatmap-sketch
Optional dependencies:
# For development
pip install heatmap-sketch[dev]
# For accelerated performance (Numba JIT)
pip install heatmap-sketch[numba]
Quick Start
Basic Usage
from heatmap_sketch import HeatMap
# Create HeatMap
hm = HeatMap(
capacity=1_000_000, # Expected unique items
error_rate=0.01, # 1% false positive rate
half_life_seconds=60.0, # Heat halves every minute
decay_interval_seconds=1.0 # Auto-decay every second
)
# Add items
hm.add("#python")
hm.add("#ai", amount=10.0)
# Check heat
heat = hm.check("#ai") # Returns current heat
# Get trending
trending = hm.top(10) # Returns top 10 items
# Check if above threshold
if hm.is_hot("#ai", threshold=5.0):
print("#ai is trending!")
# Always stop when done
hm.stop()
Context Manager
from heatmap_sketch import HeatMap
with HeatMap(half_life_seconds=60.0) as hm:
hm.add("#python")
print(hm.check("#python"))
# Automatically stops
Hierarchical HeatMap (Multiple Time Scales)
from heatmap_sketch import HierarchicalHeatMap
# Track at multiple time scales
hm = HierarchicalHeatMap(
half_lives=(10.0, 60.0, 300.0), # 10s, 1min, 5min
weights=(0.5, 0.3, 0.2) # Weight each scale
)
hm.add("#viral")
# Check heat at each level
levels = hm.check_levels("#viral")
print(f"Short-term: {levels['short']}")
print(f"Long-term: {levels['long']}")
# Combined score
combined = hm.check("#viral")
hm.stop()
API Reference
HeatMap
HeatMap(
capacity=1_000_000, # Expected unique items
error_rate=0.01, # False positive rate (0-0.5)
half_life_seconds=60.0, # Time for heat to halve
decay_interval_seconds=1.0, # Auto-decay interval (0 = manual)
track_candidates=True, # Enable top-k queries
seed=None # Random seed for reproducibility
)
Methods:
| Method | Description | Returns |
|---|---|---|
add(item, amount=1.0) |
Add heat to item | None |
add_many(items, amount=1.0) |
Bulk add | None |
check(item) |
Get current heat | float |
top(k) |
Get top-k hottest | List[Tuple[str, float]] |
is_hot(item, threshold) |
Check if above threshold | bool |
decay() |
Manual decay | None |
stats() |
Get statistics | Dict |
reset() |
Clear all | None |
stop() |
Stop auto-decay | None |
Properties:
m— Number of slotsk— Number of hash functionshalf_life— Half-life in secondsdecay_interval— Decay interval in seconds
HierarchicalHeatMap
HierarchicalHeatMap(
capacity=1_000_000,
error_rate=0.01,
half_lives=(10.0, 60.0, 300.0), # Multiple half-lives
weights=None, # Equal weights by default
decay_interval_seconds=1.0,
track_candidates=True,
seed=None
)
Methods:
add(item, amount=1.0)— Add to all levelscheck(item)— Get combined heatcheck_levels(item)— Get heat at each leveltop(k)— Get top-k by combined heat
Comparison to Other Data Structures
| Structure | Question | Decay | Top-k | Space |
|---|---|---|---|---|
| Bloom Filter | Is X in set? | ❌ | ❌ | O(m) |
| Count-Min Sketch | How many X? | ❌ | ❌ | O(m) |
| HyperLogLog | How many unique? | ❌ | ❌ | O(m) |
| Space-Saving | Top-k all-time? | ❌ | ✅ | O(k) |
| LFU Cache | Least frequently used? | ❌ | ❌ | O(n) |
| HeatMap | What's hot now? | ✅ | ✅ | O(m) |
Performance
On Apple M4 (8 cores):
| Operation | Rate |
|---|---|
| Add (single) | 3,500/sec |
| Add (bulk, 1000 items) | 380/sec |
| Check | 5,400/sec |
| Top-10 (1000 items) | 36,000/sec |
Memory: ~2 MB for 100K expected items
Use Cases
1. Social Media Trending
hm = HeatMap(half_life_seconds=300.0) # 5-minute half-life
for tweet in tweet_stream:
for hashtag in tweet.hashtags:
hm.add(hashtag)
# Get trending hashtags
trending = hm.top(10)
2. Network Monitoring / DDoS Detection
hm = HeatMap(half_life_seconds=10.0) # 10-second half-life
for packet in network_stream:
hm.add(packet.source_ip)
# Detect hot IPs (potential attack)
for ip, heat in hm.top(100):
if heat > THRESHOLD:
alert(f"Suspicious traffic from {ip}")
3. E-commerce Trending Products
hm = HeatMap(half_life_seconds=3600.0) # 1-hour half-life
for page_view in page_views:
hm.add(page_view.product_id)
# Show trending on homepage
trending_products = [item for item, _ in hm.top(20)]
4. Rate Limiting
hm = HeatMap(half_life_seconds=60.0) # 1-minute window
def check_rate_limit(user_id):
hm.add(user_id)
_, heat = hm.check(user_id)
if heat > 100: # Max 100 requests per minute
raise RateLimitExceeded()
Theoretical Guarantees
Theorem 1: Decay Accuracy
The discrete approximation converges to the continuous solution:
H_discrete(t) → H_continuous(t) as decay_interval → 0
Theorem 2: Space-Error Tradeoff
With m slots and k hash functions for n items:
P(false positive) ≤ (1 - e^(-kn/m))^k
Theorem 3: Trend Detection
If item x is truly among the top-k and has heat H_x > θ:
P(x ∈ top-k result) ≥ 1 - ε
Where ε depends on error_rate and decay_factor.
Development
# Clone repo
git clone https://github.com/thecliffhanger/heatmap-sketch.git
cd heatmap-sketch
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install dev dependencies
pip install -e ".[dev]"
# Run tests (168 tests)
pytest tests/ -v
# Run benchmarks
pytest tests/ --benchmark-only
Citation
If you use HeatMap in your research, please cite:
@software{heatmap_sketch2026,
title={HeatMap Sketch: A Data Structure for Time-Decayed Popularity},
author={Teja},
year={2026},
url={https://github.com/thecliffhanger/heatmap-sketch}
}
License
MIT
Author
Teja
HeatMap: Because "what's trending" ≠ "what's frequent."
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file heatmap_sketch-1.1.0.tar.gz.
File metadata
- Download URL: heatmap_sketch-1.1.0.tar.gz
- Upload date:
- Size: 51.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd6d4607f836eefd86c8039e4e8b956b7457830f1863e3cc07b5ad64fba62684
|
|
| MD5 |
92a9c01125bb5d90fa9e9f16d12190b2
|
|
| BLAKE2b-256 |
5b0aaf83885e87a52d80164840241a5601d00872e213bf8bd3d750dbdf0d396b
|
File details
Details for the file heatmap_sketch-1.1.0-py3-none-any.whl.
File metadata
- Download URL: heatmap_sketch-1.1.0-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0383dd1e5baf05e0cdb14d1a671c24e016c80702c33578e932d59b44657c3f07
|
|
| MD5 |
75287fc05176c1c4ff46156c2290639b
|
|
| BLAKE2b-256 |
f2261e7f8ce08b8deff720f7807478b50402c201339fe3fd5cf0f07b2a1fc278
|