Skip to main content

CLI tool that scans a WhatsApp bot inbox for duplicate message files caused by parallel LLM processing

Project description

whatsapp-dedup-guard

CLI tool that scans a WhatsApp bot inbox for duplicate message files and removes the noise.

Built because every WhatsApp message was being saved 2-3× — once per LLM responding in parallel (Gemini + GPT + local LLM). The inbox looked busy. It wasn't.


What it does

 INBOX/whatsapp/
  ├── 20260527-041538-...-2ac234681b402aa8d891-mpnjwwxl.md  ← ORIGINAL
  ├── 20260527-041539-...-2ac234681b402aa8d891-mpnjwxbu.md  ← DUPLICATE
  ├── 20260527-041634-...-2a4e7a869b127b936cc6-mpnjy3l0.md  ← ORIGINAL
  └── 20260527-041635-...-2a4e7a869b127b936cc6-mpnjy4pg.md  ← DUPLICATE

whatsapp_dedup_guard.py parses Message ID: from each file header, groups by ID, and identifies extras. Keeps the earliest copy. Flags or deletes the rest.


Quick start

# Scan for duplicates (read-only, no changes)
python3 whatsapp_dedup_guard.py scan /path/to/whatsapp/inbox/

# Full report with details
python3 whatsapp_dedup_guard.py report /path/to/whatsapp/inbox/

# Stats: total files, unique IDs, duplicate rate
python3 whatsapp_dedup_guard.py stats /path/to/whatsapp/inbox/

# Mark duplicates with Duplicate: true header (dry-run first)
python3 whatsapp_dedup_guard.py mark --dry-run /path/to/whatsapp/inbox/
python3 whatsapp_dedup_guard.py mark /path/to/whatsapp/inbox/

File structure

whatsapp-dedup-guard/
├── whatsapp_dedup_guard.py   # CLI tool — stdlib only, no dependencies
└── README.md

Expected inbox file format

Files must have these headers in the first 15 lines:

# WhatsApp Task - DD/MM/YYYY, H:MM:SS

Status: new
Source: WhatsApp
Sender: 972XXXXXXXXX@c.us
Message ID: 2AC234681B402AA8D891
Agent Route: vision
Saved At: 2026-05-27T04:15:38.937Z

Exit codes

Code Meaning
0 Clean — no duplicates found
1 Duplicates detected
2 Error (bad path, parse failure)

Use in CI/cron: python3 whatsapp_dedup_guard.py scan $DIR || alert "duplicates in inbox"


Requirements

  • Python 3.10+
  • stdlib only (argparse, os, sys, datetime) — no pip install needed

Why this exists

Built as part of RLASAF12's AI agent system. The WhatsApp bot was routing messages to multiple LLMs simultaneously, causing each message to be saved 2-3× under different filenames. At 15%+ duplicate rate across 152 files, the inbox was unreliable. This tool fixes that.


Built by Ben (nightly builder) · 2026-05-28

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whatsapp_dedup_guard-0.1.0.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whatsapp_dedup_guard-0.1.0-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file whatsapp_dedup_guard-0.1.0.tar.gz.

File metadata

  • Download URL: whatsapp_dedup_guard-0.1.0.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for whatsapp_dedup_guard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 116d4d8f178b1fb17e53efa925d0da48c26f5b08a493b713a4db1218d8f05bae
MD5 2b4513ebabf97f5d61099b3628221b3b
BLAKE2b-256 9d06b161e0d4054ac63d1ca39c69e0f9ca2a598ef20d40c333fbef22425c9cb5

See more details on using hashes here.

Provenance

The following attestation bundles were made for whatsapp_dedup_guard-0.1.0.tar.gz:

Publisher: publish.yml on RLASAF12/whatsapp-dedup-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file whatsapp_dedup_guard-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for whatsapp_dedup_guard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1b58ab85cf64043a7ca88ce4f12cf140757c47ab2b726a480433642a6cda5da1
MD5 f881b7ff0071ebe9e4321e5f3ca221e0
BLAKE2b-256 67dc8728e48d497bbad59fb018a53342b784ef20715e61c45abca6439cf65df4

See more details on using hashes here.

Provenance

The following attestation bundles were made for whatsapp_dedup_guard-0.1.0-py3-none-any.whl:

Publisher: publish.yml on RLASAF12/whatsapp-dedup-guard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page