Skip to main content

A probabilistic data structure for time-decayed popularity tracking

Project description

HeatMap Sketch

PyPI version Python versions License: MIT Tests

A probabilistic data structure for time-decayed popularity.

Paper forthcoming on arXiv — Full theoretical analysis and experimental validation of the HeatMap sketch data structure, including decay accuracy bounds, space-error tradeoffs, and comparisons with Count-Min Sketch, Space-Saving, and LFU cache.

The Problem

You need to track what's trending right now, not what was popular historically.

  • Counting gives you total frequency (all-time popular)
  • Bloom filters tell you if something exists
  • LRU tells you what was accessed recently

None of these tell you "what's hot NOW".

The Solution: HeatMap

HeatMap tracks time-decayed popularity — items have "heat" that automatically cools over time.

from heatmap_sketch import HeatMap

# Create a HeatMap with 1-minute half-life
hm = HeatMap(half_life_seconds=60.0)

# Add heat to items
hm.add("#python")
hm.add("#ai", amount=10.0)

# Check current heat
print(hm.check("#ai"))  # 10.0

# Get top trending
for item, heat in hm.top(10):
    print(f"{item}: {heat}")

# Clean up
hm.stop()

How It Works

HeatMap implements exponential decay as a differential equation:

dH/dt = -λ·H + δ(item)

Every contribution has a half-life. Old contributions fade. New ones stay hot.

Key properties:

  • O(k) add and check operations (k = number of hash functions)
  • O(m) space (m = number of slots, typically ~1M)
  • Probabilistic — may have false positives, but no false negatives
  • Thread-safe — can be used from multiple threads
  • Auto-decay — background thread handles time decay

Installation

pip install heatmap-sketch

Optional dependencies:

# For development
pip install heatmap-sketch[dev]

# For accelerated performance (Numba JIT)
pip install heatmap-sketch[numba]

Quick Start

Basic Usage

from heatmap_sketch import HeatMap

# Create HeatMap
hm = HeatMap(
    capacity=1_000_000,      # Expected unique items
    error_rate=0.01,         # 1% false positive rate
    half_life_seconds=60.0,  # Heat halves every minute
    decay_interval_seconds=1.0  # Auto-decay every second
)

# Add items
hm.add("#python")
hm.add("#ai", amount=10.0)

# Check heat
heat = hm.check("#ai")  # Returns current heat

# Get trending
trending = hm.top(10)  # Returns top 10 items

# Check if above threshold
if hm.is_hot("#ai", threshold=5.0):
    print("#ai is trending!")

# Always stop when done
hm.stop()

Context Manager

from heatmap_sketch import HeatMap

with HeatMap(half_life_seconds=60.0) as hm:
    hm.add("#python")
    print(hm.check("#python"))
# Automatically stops

Hierarchical HeatMap (Multiple Time Scales)

from heatmap_sketch import HierarchicalHeatMap

# Track at multiple time scales
hm = HierarchicalHeatMap(
    half_lives=(10.0, 60.0, 300.0),  # 10s, 1min, 5min
    weights=(0.5, 0.3, 0.2)          # Weight each scale
)

hm.add("#viral")

# Check heat at each level
levels = hm.check_levels("#viral")
print(f"Short-term: {levels['short']}")
print(f"Long-term: {levels['long']}")

# Combined score
combined = hm.check("#viral")

hm.stop()

API Reference

HeatMap

HeatMap(
    capacity=1_000_000,       # Expected unique items
    error_rate=0.01,          # False positive rate (0-0.5)
    half_life_seconds=60.0,   # Time for heat to halve
    decay_interval_seconds=1.0,  # Auto-decay interval (0 = manual)
    track_candidates=True,    # Enable top-k queries
    seed=None                 # Random seed for reproducibility
)

Methods:

Method Description Returns
add(item, amount=1.0) Add heat to item None
add_many(items, amount=1.0) Bulk add None
check(item) Get current heat float
top(k) Get top-k hottest List[Tuple[str, float]]
is_hot(item, threshold) Check if above threshold bool
decay() Manual decay None
stats() Get statistics Dict
reset() Clear all None
stop() Stop auto-decay None

Properties:

  • m — Number of slots
  • k — Number of hash functions
  • half_life — Half-life in seconds
  • decay_interval — Decay interval in seconds

HierarchicalHeatMap

HierarchicalHeatMap(
    capacity=1_000_000,
    error_rate=0.01,
    half_lives=(10.0, 60.0, 300.0),  # Multiple half-lives
    weights=None,  # Equal weights by default
    decay_interval_seconds=1.0,
    track_candidates=True,
    seed=None
)

Methods:

  • add(item, amount=1.0) — Add to all levels
  • check(item) — Get combined heat
  • check_levels(item) — Get heat at each level
  • top(k) — Get top-k by combined heat

Comparison to Other Data Structures

Structure Question Decay Top-k Space
Bloom Filter Is X in set? O(m)
Count-Min Sketch How many X? O(m)
HyperLogLog How many unique? O(m)
Space-Saving Top-k all-time? O(k)
LFU Cache Least frequently used? O(n)
HeatMap What's hot now? O(m)

Performance

On Apple M4 (8 cores):

Operation Rate
Add (single) 3,500/sec
Add (bulk, 1000 items) 380/sec
Check 5,400/sec
Top-10 (1000 items) 36,000/sec

Memory: ~2 MB for 100K expected items

Use Cases

1. Social Media Trending

hm = HeatMap(half_life_seconds=300.0)  # 5-minute half-life

for tweet in tweet_stream:
    for hashtag in tweet.hashtags:
        hm.add(hashtag)

# Get trending hashtags
trending = hm.top(10)

2. Network Monitoring / DDoS Detection

hm = HeatMap(half_life_seconds=10.0)  # 10-second half-life

for packet in network_stream:
    hm.add(packet.source_ip)

# Detect hot IPs (potential attack)
for ip, heat in hm.top(100):
    if heat > THRESHOLD:
        alert(f"Suspicious traffic from {ip}")

3. E-commerce Trending Products

hm = HeatMap(half_life_seconds=3600.0)  # 1-hour half-life

for page_view in page_views:
    hm.add(page_view.product_id)

# Show trending on homepage
trending_products = [item for item, _ in hm.top(20)]

4. Rate Limiting

hm = HeatMap(half_life_seconds=60.0)  # 1-minute window

def check_rate_limit(user_id):
    hm.add(user_id)
    _, heat = hm.check(user_id)
    
    if heat > 100:  # Max 100 requests per minute
        raise RateLimitExceeded()

Theoretical Guarantees

Theorem 1: Decay Accuracy

The discrete approximation converges to the continuous solution:

H_discrete(t) → H_continuous(t) as decay_interval → 0

Theorem 2: Space-Error Tradeoff

With m slots and k hash functions for n items:

P(false positive) ≤ (1 - e^(-kn/m))^k

Theorem 3: Trend Detection

If item x is truly among the top-k and has heat H_x > θ:

P(x ∈ top-k result) ≥ 1 - ε

Where ε depends on error_rate and decay_factor.

Development

# Clone repo
git clone https://github.com/thecliffhanger/heatmap-sketch.git
cd heatmap-sketch

# Create virtual environment
python -m venv venv
source venv/bin/activate

# Install dev dependencies
pip install -e ".[dev]"

# Run tests (168 tests)
pytest tests/ -v

# Run benchmarks
pytest tests/ --benchmark-only

Citation

If you use HeatMap in your research, please cite:

@software{heatmap_sketch2026,
  title={HeatMap Sketch: A Data Structure for Time-Decayed Popularity},
  author={Teja},
  year={2026},
  url={https://github.com/thecliffhanger/heatmap-sketch}
}

License

MIT

Author

Teja


HeatMap: Because "what's trending" ≠ "what's frequent."

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

heatmap_sketch-1.1.0.tar.gz (51.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

heatmap_sketch-1.1.0-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file heatmap_sketch-1.1.0.tar.gz.

File metadata

  • Download URL: heatmap_sketch-1.1.0.tar.gz
  • Upload date:
  • Size: 51.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for heatmap_sketch-1.1.0.tar.gz
Algorithm Hash digest
SHA256 fd6d4607f836eefd86c8039e4e8b956b7457830f1863e3cc07b5ad64fba62684
MD5 92a9c01125bb5d90fa9e9f16d12190b2
BLAKE2b-256 5b0aaf83885e87a52d80164840241a5601d00872e213bf8bd3d750dbdf0d396b

See more details on using hashes here.

File details

Details for the file heatmap_sketch-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: heatmap_sketch-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for heatmap_sketch-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0383dd1e5baf05e0cdb14d1a671c24e016c80702c33578e932d59b44657c3f07
MD5 75287fc05176c1c4ff46156c2290639b
BLAKE2b-256 f2261e7f8ce08b8deff720f7807478b50402c201339fe3fd5cf0f07b2a1fc278

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page