Extract, classify, and enrich links from WhatsApp chat exports
Project description
whatsapp-link-parser
Extract, classify, and enrich links from WhatsApp chat exports. Works as a CLI tool or a Python library.
Raw .txt → Parse → Extract → Classify → Enrich → Export (CSV/JSON)
Installation
pip install whatsapp-link-parser
Quick start
The CLI is available as both whatsapp-links and wa-links.
wa-links import chat.txt --group "Goa Trip 2025"
wa-links enrich "Goa Trip 2025"
wa-links export "Goa Trip 2025"
Output CSV:
sender,date,link,domain,type,title,description,context
Arjun,2025-10-12,https://www.youtube.com/watch?v=K3FnLas09mw,youtube.com,youtube,Best Beaches in South Goa 2025,...
Meera,2025-10-14,https://www.airbnb.co.in/rooms/52841379,airbnb.co.in,travel,Beachside Villa in Palolem,...
CLI reference
wa-links import <file> --group "Name" # import a chat export
wa-links enrich "Name" # fetch page titles and descriptions
wa-links export "Name" # export to CSV
wa-links stats "Name" # show statistics
wa-links groups # list imported groups
wa-links contacts "Name" [--resolve] # list or resolve contacts
wa-links reset "Name" --yes # delete all data for a group
Export filters:
wa-links export "Name" --type youtube --format json
wa-links export "Name" --sender "Alice" --after 2025-10-01 --before 2025-11-01
wa-links export "Name" --domain airbnb
wa-links export "Name" --no-exclude # include Zoom/Meet links too
wa-links export "Name" --dedup # one row per URL, adds "Times Shared" column
| Flag | Description |
|---|---|
--output |
Output file path |
--type |
Filter by link type (youtube, travel, food, ...) |
--sender |
Filter by sender name (substring match) |
--after / --before |
Date range (YYYY-MM-DD) |
--domain |
Filter by domain (substring match) |
--format |
csv (default) or json |
--no-exclude |
Disable default domain exclusions |
--dedup |
Deduplicate by URL; adds Times Shared count column |
Library usage
from wa_link_parser import parse_chat_file, extract_links, fetch_metadata, export_links
messages = parse_chat_file("chat.txt")
for msg in messages:
for link in extract_links(msg.raw_text):
print(f"{msg.sender}: {link.url} ({link.link_type})")
title, description = fetch_metadata("https://example.com")
export_links("Goa Trip 2025") # default exclusions
export_links("Goa Trip 2025", exclude_domains=[]) # no exclusions
export_links("Goa Trip 2025", exclude_domains=["zoom.us"]) # custom list
| Function | Description |
|---|---|
parse_chat_file(path) |
Parse a .txt export → ParsedMessage list |
extract_links(text) |
Extract URLs from text → ExtractedLink list |
classify_url(url) |
Classify a URL by domain → link type string |
normalize_url(url) |
Strip tracking params and canonicalize a URL |
fetch_metadata(url) |
Fetch page title and OG description |
enrich_links(group_id) |
Enrich all unenriched links for a group |
export_links(group, ...) |
Export with filters, exclusions, and optional dedup |
filter_excluded_domains(links, ...) |
Filter by domain exclusion list |
reset_exclusion_cache() |
Clear cached exclusions (for testing) |
Data classes: ParsedMessage (timestamp, sender, raw_text, is_system), ExtractedLink (url, domain, link_type, raw_url), ImportStats.
Configuration
Link types
Create link_types.json in your working directory to add or override domain mappings:
{
"tiktok.com": "tiktok",
"substack.com": "newsletter"
}
Built-in types: youtube, google_maps, document, instagram, twitter, spotify, reddit, linkedin, article, notion, github, stackoverflow, shopping, food, travel, general.
Domain exclusions
export filters out ephemeral links by default (video calls, email, URL shorteners). Override with exclusions.json in your working directory:
["calendly.com", "!bit.ly"]
Prefix with ! to remove a built-in default. All links are still stored in the database — exclusions only apply at export time.
Supported formats
The parser auto-detects 7 WhatsApp export formats:
| Format | Example |
|---|---|
| Indian (bracket, tilde) | [20/10/2025, 10:29:01 AM] ~ Sender: text |
| US (bracket, short year) | [1/15/25, 3:45:30 PM] Sender: text |
| International (no bracket, 24h) | 20/10/2025, 14:30 - Sender: text |
| US (no bracket, 12h) | 1/15/25, 3:45 PM - Sender: text |
| European (short year, 24h) | 20/10/25, 14:30 - Sender: text |
| German (dots) | 20.10.25, 14:30 - Sender: text |
| Bracket (no tilde, full year) | [20/10/2025, 10:29:01 AM] Sender: text |
Storage
SQLite (WAL mode). Defaults to wa_links.db in the current directory; override with:
export WA_LINKS_DB_PATH=/path/to/wa_links.db
Imports are idempotent — reimporting the same file won't create duplicates.
License
MIT License
Copyright (c) 2025 Sreeram Ramasubramanian
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file whatsapp_link_parser-0.2.3.tar.gz.
File metadata
- Download URL: whatsapp_link_parser-0.2.3.tar.gz
- Upload date:
- Size: 24.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
283381d2017b6baa5c104af2dd63e72889afe619a1cc4bafeefd00b3ca9b7f0e
|
|
| MD5 |
10c2e83bc251045d5b15fd13a3c0eb36
|
|
| BLAKE2b-256 |
ed243842f2f2ee666118d1a47b50e4f037f33b6d865656428039e19f600779f1
|
File details
Details for the file whatsapp_link_parser-0.2.3-py3-none-any.whl.
File metadata
- Download URL: whatsapp_link_parser-0.2.3-py3-none-any.whl
- Upload date:
- Size: 25.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41b47c15c4c1cc6449684f1ab4db750108b4e1e04e40adc94266d740ed09438b
|
|
| MD5 |
9ff8d169a56263812e9c3cdfb84ac892
|
|
| BLAKE2b-256 |
28ceb8c602a62867a1df6cf7a0ad832d487c856db8c69d06b279cb6731b9b81b
|