CLI tool that scans a WhatsApp bot inbox for duplicate message files caused by parallel LLM processing
Project description
whatsapp-dedup-guard
CLI tool that scans a WhatsApp bot inbox for duplicate message files and removes the noise.
Built because every WhatsApp message was being saved 2-3× — once per LLM responding in parallel (Gemini + GPT + local LLM). The inbox looked busy. It wasn't.
What it does
INBOX/whatsapp/
├── 20260527-041538-...-2ac234681b402aa8d891-mpnjwwxl.md ← ORIGINAL
├── 20260527-041539-...-2ac234681b402aa8d891-mpnjwxbu.md ← DUPLICATE
├── 20260527-041634-...-2a4e7a869b127b936cc6-mpnjy3l0.md ← ORIGINAL
└── 20260527-041635-...-2a4e7a869b127b936cc6-mpnjy4pg.md ← DUPLICATE
whatsapp_dedup_guard.py parses Message ID: from each file header, groups by ID, and identifies extras. Keeps the earliest copy. Flags or deletes the rest.
Quick start
# Scan for duplicates (read-only, no changes)
python3 whatsapp_dedup_guard.py scan /path/to/whatsapp/inbox/
# Full report with details
python3 whatsapp_dedup_guard.py report /path/to/whatsapp/inbox/
# Stats: total files, unique IDs, duplicate rate
python3 whatsapp_dedup_guard.py stats /path/to/whatsapp/inbox/
# Mark duplicates with Duplicate: true header (dry-run first)
python3 whatsapp_dedup_guard.py mark --dry-run /path/to/whatsapp/inbox/
python3 whatsapp_dedup_guard.py mark /path/to/whatsapp/inbox/
File structure
whatsapp-dedup-guard/
├── whatsapp_dedup_guard.py # CLI tool — stdlib only, no dependencies
└── README.md
Expected inbox file format
Files must have these headers in the first 15 lines:
# WhatsApp Task - DD/MM/YYYY, H:MM:SS
Status: new
Source: WhatsApp
Sender: 972XXXXXXXXX@c.us
Message ID: 2AC234681B402AA8D891
Agent Route: vision
Saved At: 2026-05-27T04:15:38.937Z
Exit codes
| Code | Meaning |
|---|---|
0 |
Clean — no duplicates found |
1 |
Duplicates detected |
2 |
Error (bad path, parse failure) |
Use in CI/cron: python3 whatsapp_dedup_guard.py scan $DIR || alert "duplicates in inbox"
Requirements
- Python 3.10+
- stdlib only (
argparse,os,sys,datetime) — no pip install needed
Why this exists
Built as part of RLASAF12's AI agent system. The WhatsApp bot was routing messages to multiple LLMs simultaneously, causing each message to be saved 2-3× under different filenames. At 15%+ duplicate rate across 152 files, the inbox was unreliable. This tool fixes that.
Built by Ben (nightly builder) · 2026-05-28
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whatsapp_dedup_guard-0.1.0.tar.gz.
File metadata
- Download URL: whatsapp_dedup_guard-0.1.0.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
116d4d8f178b1fb17e53efa925d0da48c26f5b08a493b713a4db1218d8f05bae
|
|
| MD5 |
2b4513ebabf97f5d61099b3628221b3b
|
|
| BLAKE2b-256 |
9d06b161e0d4054ac63d1ca39c69e0f9ca2a598ef20d40c333fbef22425c9cb5
|
Provenance
The following attestation bundles were made for whatsapp_dedup_guard-0.1.0.tar.gz:
Publisher:
publish.yml on RLASAF12/whatsapp-dedup-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
whatsapp_dedup_guard-0.1.0.tar.gz -
Subject digest:
116d4d8f178b1fb17e53efa925d0da48c26f5b08a493b713a4db1218d8f05bae - Sigstore transparency entry: 1723251747
- Sigstore integration time:
-
Permalink:
RLASAF12/whatsapp-dedup-guard@809db2a1c54ff1dadab0d6d1dfa033fe8a3411b9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/RLASAF12
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@809db2a1c54ff1dadab0d6d1dfa033fe8a3411b9 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file whatsapp_dedup_guard-0.1.0-py3-none-any.whl.
File metadata
- Download URL: whatsapp_dedup_guard-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b58ab85cf64043a7ca88ce4f12cf140757c47ab2b726a480433642a6cda5da1
|
|
| MD5 |
f881b7ff0071ebe9e4321e5f3ca221e0
|
|
| BLAKE2b-256 |
67dc8728e48d497bbad59fb018a53342b784ef20715e61c45abca6439cf65df4
|
Provenance
The following attestation bundles were made for whatsapp_dedup_guard-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on RLASAF12/whatsapp-dedup-guard
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
whatsapp_dedup_guard-0.1.0-py3-none-any.whl -
Subject digest:
1b58ab85cf64043a7ca88ce4f12cf140757c47ab2b726a480433642a6cda5da1 - Sigstore transparency entry: 1723251901
- Sigstore integration time:
-
Permalink:
RLASAF12/whatsapp-dedup-guard@809db2a1c54ff1dadab0d6d1dfa033fe8a3411b9 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/RLASAF12
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@809db2a1c54ff1dadab0d6d1dfa033fe8a3411b9 -
Trigger Event:
workflow_dispatch
-
Statement type: