Use the power of pandas to search through your WhatsApp messages (Doesn't require root access!)
Project description
Finding certain messages in your WhatsApp message is sometimes hard, but Pandas makes it easier. Doesn't require root access!
Tested with Google Pixel 6 / Windows 10 / Python 3.9.13
pip install a-pandas-ex-whatsapp-to-df
Follow these steps if you want to avoid rooting your cell phone:
-
Open WhatsApp on your Android device.
-
Tap the three-dot overflow menu button and navigate to "Settings > Chats > Chat backup".
-
Tap End-to-end encrypted backup, then tap "Turn on".
-
Create a 64-digit key and save it! You will need it! DON'T CREATE a password!
-
Tap Create and wait for WhatsApp to create an encrypted backup.
-
Copy the backup from your Android device to your computer (File path on my device: storage/emulated/0/Android/media/com.whatsapp/WhatsApp/Databases/msgstore.db.crypt15 - You can localize the file using https://pypi.org/project/a-pandas-ex-adb-to-df )
from a_pandas_ex_whatsapp_to_df import pd_add_whatsapp_to_df, decrypt_file
import pandas as pd
pd_add_whatsapp_to_df()
encrypteddb = r"F:\w32\Databases1\msgstore.db.crypt15"
decrypteddb = r"F:\newtest\msgstorexxxxxxxxxxxxxxxxx.db"
decryptkey = "THE 64-DIGIT KEY THAT YOU HAVE CREATED"
# this function will decrypt your WhatsApp database
# using https://github.com/ElDavoo/WhatsApp-Crypt14-Crypt15-Decrypter
# if the folder WhatsApp-Crypt14-Crypt15-Decrypter doesn't exist,
# the script will execute
# "git clone https://github.com/ElDavoo/WhatsApp-Crypt14-Crypt15-Decrypter.git"
# which means git must be installed and added to the path variable.
decrypt_file(
encrypteddb, decrypteddb, decryptkey,
)
df = pd.Q_whatsapp_to_df(sql_database=decrypteddb)
#output:
[I] Crypt15 / Raw key loaded
[I] Database header parsed
[I] Done
message_template 59
message_template_button 56
message_location 85
message_quoted_location 16
message_mentions 5377
message_media 99775
message_vcard 561
message_vcard_jid 529
message_streaming_sidecar 370
message_quoted_media 18289
message_quoted 114225
message_quoted_mentions 3088
message_thumbnail 60438
message_link 23190
message_quoted_vcard 73
message_text 12181
message_quoted_text 2077
message_send_count 56992
receipt_device 78638
message_system 21142
message_system_group 13739
message_system_value_change 2416
message_system_number_change 1660
message_system_photo_change 48
message_system_chat_participant 8134
receipt_user 359648
message_revoked 5151
messages_hydrated_four_row_template 59
message_system_block_contact 1
message_ephemeral_setting 33
message_view_once_media 13
mms_thumbnail_metadata 74
message_system_initial_privacy_provider 3304
message_privacy_state 15
message_system_business_state 18
message_ephemeral 37
played_self_receipt 204
message_system_linked_group_call 10
audio_data 6809
...
#You should get a DataFrame containing the following columns:
['_id_x',
'chat_row_id',
'from_me',
'key_id',
'sender_jid_row_id',
'status',
'broadcast',
'recipient_count',
'participant_hash',
'origination_flags',
'origin',
'timestamp',
'received_timestamp',
'receipt_server_timestamp',
'message_type',
'text_data',
'starred',
'lookup_tables',
'sort_id',
'message_add_on_flags',
'_id_y',
'key_remote_jid',
'remote_resource',
'receipt_device_timestamp',
'read_device_timestamp',
'played_device_timestamp',
'phone_number',
'user_id',
'_id',
'user',
'server',
'agent',
'type',
'raw_string',
'device',
'group',
'message_row_id',
'video_call',
'group_jid_row_id',
'is_joinable_group_call',
'conversation__id',
'conversation_jid_row_id',
'conversation_hidden',
'conversation_subject',
'conversation_created_timestamp',
'conversation_display_message_row_id',
'conversation_last_message_row_id',
'conversation_last_read_message_row_id',
'conversation_last_read_receipt_sent_message_row_id',
'conversation_last_important_message_row_id',
'conversation_archived',
'conversation_sort_timestamp',
'conversation_mod_tag',
'conversation_gen',
'conversation_spam_detection',
'conversation_unseen_earliest_message_received_time',
'conversation_unseen_message_count',
'conversation_unseen_missed_calls_count',
'conversation_unseen_row_count',
'conversation_plaintext_disabled',
'conversation_vcard_ui_dismissed',
'conversation_change_number_notified_message_row_id',
'conversation_show_group_description',
'conversation_ephemeral_expiration',
'conversation_last_read_ephemeral_message_row_id',
'conversation_ephemeral_setting_timestamp',
'conversation_unseen_important_message_count',
'conversation_ephemeral_disappearing_messages_initiator',
'conversation_group_type',
'conversation_last_message_reaction_row_id',
'conversation_last_seen_message_reaction_row_id',
'conversation_unseen_message_reaction_count',
'conversation_growth_lock_level',
'conversation_growth_lock_expiration_ts',
'conversation_last_read_message_sort_id',
'conversation_display_message_sort_id',
'conversation_last_message_sort_id',
'conversation_last_read_receipt_sent_message_sort_id',
'message_forwarded',
'message_template_message_row_id',
'message_template_content_text_data',
'message_template_footer_text_data',
'message_template_template_id',
'message_template_csat_trigger_expiration_ts',
'message_template_button__id',
'message_template_button_message_row_id',
'message_template_button_text_data',
'message_template_button_extra_data',
'message_template_button_button_type',
'message_template_button_used',
'message_template_button_selected_index',
'message_template_button_otp_button_type',
'message_location_message_row_id',
'message_location_chat_row_id',
'message_location_latitude',
'message_location_longitude',
'message_location_place_name',
'message_location_place_address',
'message_location_url',
'message_location_live_location_share_duration',
'message_location_live_location_sequence_number',
'message_location_live_location_final_latitude',
'message_location_live_location_final_longitude',
'message_location_live_location_final_timestamp',
'message_location_map_download_status',
'message_quoted_location_message_row_id',
'message_quoted_location_latitude',
'message_quoted_location_longitude',
'message_quoted_location_place_name',
'message_quoted_location_place_address',
'message_quoted_location_url',
'message_quoted_location_thumbnail',
'message_mentions__id',
'message_mentions_message_row_id',
'message_mentions_jid_row_id',
'message_media_message_row_id',
'message_media_chat_row_id',
'message_media_autotransfer_retry_enabled',
'message_media_multicast_id',
'message_media_media_job_uuid',
'message_media_transferred',
'message_media_transcoded',
'message_media_file_path',
'message_media_file_size',
'message_media_suspicious_content',
'message_media_trim_from',
'message_media_trim_to',
'message_media_face_x',
'message_media_face_y',
'message_media_media_key',
'message_media_media_key_timestamp',
'message_media_width',
'message_media_height',
'message_media_has_streaming_sidecar',
'message_media_gif_attribution',
'message_media_thumbnail_height_width_ratio',
'message_media_direct_path',
'message_media_first_scan_sidecar',
'message_media_first_scan_length',
'message_media_message_url',
'message_media_mime_type',
'message_media_file_length',
'message_media_media_name',
'message_media_file_hash',
'message_media_media_duration',
'message_media_page_count',
'message_media_enc_file_hash',
'message_media_partial_media_hash',
'message_media_partial_media_enc_hash',
'message_media_is_animated_sticker',
'message_media_original_file_hash',
'message_media_mute_video',
'message_vcard__id',
'message_vcard_message_row_id',
'message_vcard_vcard',
'message_vcard_jid__id',
'message_vcard_jid_vcard_jid_row_id',
'message_vcard_jid_vcard_row_id',
'message_vcard_jid_message_row_id',
'message_streaming_sidecar_message_row_id',
'message_streaming_sidecar_sidecar',
'message_streaming_sidecar_chunk_lengths',
'message_streaming_sidecar_timestamp',
'message_quoted_media_message_row_id',
'message_quoted_media_media_job_uuid',
'message_quoted_media_transferred',
'message_quoted_media_file_path',
'message_quoted_media_file_size',
'message_quoted_media_media_key',
'message_quoted_media_media_key_timestamp',
'message_quoted_media_width',
'message_quoted_media_height',
'message_quoted_media_direct_path',
'message_quoted_media_message_url',
'message_quoted_media_mime_type',
'message_quoted_media_file_length',
'message_quoted_media_media_name',
'message_quoted_media_file_hash',
'message_quoted_media_media_duration',
'message_quoted_media_page_count',
'message_quoted_media_enc_file_hash',
'message_quoted_media_thumbnail',
'message_quoted_message_row_id',
'message_quoted_chat_row_id',
'message_quoted_parent_message_chat_row_id',
'message_quoted_from_me',
'message_quoted_sender_jid_row_id',
'message_quoted_key_id',
'message_quoted_timestamp',
'message_quoted_message_type',
'message_quoted_text_data',
'message_quoted_payment_transaction_id',
'message_quoted_lookup_tables',
'message_quoted_origin',
'message_quoted_mentions__id',
'message_quoted_mentions_message_row_id',
'message_quoted_mentions_jid_row_id',
'message_thumbnail_message_row_id',
'message_thumbnail_thumbnail',
'message_link__id',
'message_link_chat_row_id',
'message_link_message_row_id',
'message_link_link_index',
'message_quoted_vcard__id',
'message_quoted_vcard_message_row_id',
'message_quoted_vcard_vcard',
'message_text_message_row_id',
'message_text_description',
'message_text_page_title',
'message_text_url',
'message_text_font_style',
'message_text_text_color',
'message_text_background_color',
'message_text_preview_type',
'message_text_invite_link_group_type',
'message_quoted_text_message_row_id',
'message_quoted_text_thumbnail',
'message_send_count_message_row_id',
'message_send_count_send_count',
'receipt_device__id',
'receipt_device_message_row_id',
'receipt_device_receipt_device_jid_row_id',
'receipt_device_receipt_device_timestamp',
'receipt_device_primary_device_version',
'message_system_message_row_id',
'message_system_action_type',
'message_system_group_message_row_id',
'message_system_group_is_me_joined',
'message_system_value_change_message_row_id',
'message_system_value_change_old_data',
'message_system_number_change_message_row_id',
'message_system_number_change_old_jid_row_id',
'message_system_number_change_new_jid_row_id',
'message_system_photo_change_message_row_id',
'message_system_photo_change_new_photo_id',
'message_system_photo_change_old_photo',
'message_system_photo_change_new_photo',
'message_system_chat_participant_message_row_id',
'message_system_chat_participant_user_jid_row_id',
'receipt_user__id',
'receipt_user_message_row_id',
'receipt_user_receipt_user_jid_row_id',
'receipt_user_receipt_timestamp',
'receipt_user_read_timestamp',
'receipt_user_played_timestamp',
'message_revoked_message_row_id',
'message_revoked_revoked_key_id',
'message_revoked_admin_jid_row_id',
'messages_hydrated_four_row_template_message_row_id',
'messages_hydrated_four_row_template_message_template_id',
'message_ephemeral_setting_message_row_id',
'message_ephemeral_setting_setting_duration',
'message_ephemeral_setting_setting_reason',
'message_ephemeral_setting_user_jid_row_id_csv',
'message_view_once_media_message_row_id',
'message_view_once_media_state',
'mms_thumbnail_metadata_message_row_id',
'mms_thumbnail_metadata_direct_path',
'mms_thumbnail_metadata_media_key',
'mms_thumbnail_metadata_media_key_timestamp',
'mms_thumbnail_metadata_enc_thumb_hash',
'mms_thumbnail_metadata_thumb_hash',
'mms_thumbnail_metadata_thumb_width',
'mms_thumbnail_metadata_thumb_height',
'mms_thumbnail_metadata_transferred',
'mms_thumbnail_metadata_micro_thumbnail',
'mms_thumbnail_metadata_insert_timestamp',
'message_system_initial_privacy_provider_message_row_id',
'message_system_initial_privacy_provider_privacy_provider',
'message_system_initial_privacy_provider_verified_biz_name',
'message_system_initial_privacy_provider_biz_state_id',
'message_privacy_state_message_row_id',
'message_privacy_state_host_storage',
'message_privacy_state_actual_actors',
'message_privacy_state_privacy_mode_ts',
'message_privacy_state_business_name',
'message_system_business_state_message_row_id',
'message_system_business_state_privacy_message_type',
'message_system_business_state_business_name',
'message_ephemeral_message_row_id',
'message_ephemeral_duration',
'message_ephemeral_expire_timestamp',
'message_ephemeral_keep_in_chat',
'played_self_receipt_message_row_id',
'played_self_receipt_to_jid_row_id',
'played_self_receipt_participant_jid_row_id',
'played_self_receipt_message_id',
'message_system_linked_group_call_message_row_id',
'message_system_linked_group_call_call_id',
'message_system_linked_group_call_is_video_call',
'audio_data_message_row_id',
'audio_data_waveform']
_id_x ... audio_data_waveform
0 2101 ... NaN
1 517004 ... NaN
2 441187 ... NaN
3 441119 ... NaN
4 441115 ... NaN
... ... ...
866976 803337 ... NaN
866977 803333 ... b'\x00\x00F\x00\x00\x14%\x1c\x004\x00\x00\r\x1...
866978 803332 ... NaN
866979 803331 ... b'\x00\x00\x00\x008\x06&;*\x00\x00\x16\x1178)\...
866980 803330 ... NaN
[866981 rows x 292 columns]
If this is too much information, you can exclude tables from being parsed
Parameters:
sql_database: str
The file path to your decrypted SQL Database
databases_to_add: tuple
The SQL tables to include in the output DataFrame
default = (
"message_template",
"message_template_button",
"message_location",
"message_quoted_location",
"message_mentions",
"message_media",
"message_vcard",
"message_vcard_jid",
"message_streaming_sidecar",
"message_quoted_media",
"message_quoted",
"message_quoted_mentions",
"message_thumbnail",
"message_link",
"message_quoted_vcard",
"message_text",
"message_quoted_text",
"message_send_count",
"receipt_device",
"message_system",
"message_system_group",
"message_system_value_change",
"message_system_number_change",
"message_system_photo_change",
"message_system_chat_participant",
"receipt_user",
"message_revoked",
"messages_hydrated_four_row_template",
"message_system_block_contact",
"message_ephemeral_setting",
"message_view_once_media",
"mms_thumbnail_metadata",
"message_system_initial_privacy_provider",
"message_privacy_state",
"message_system_business_state",
"message_ephemeral",
"played_self_receipt",
"message_system_linked_group_call",
"audio_data",
)
optimize_dtypes:bool
Optimize dtypes at the end of the conversion to save memory
default = True
Returns
df: pd.DataFrame
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file a_pandas_ex_whatsapp_to_df-0.10.tar.gz
.
File metadata
- Download URL: a_pandas_ex_whatsapp_to_df-0.10.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e53edcda8bebf0ba70ea87f53890a71b2d3c7948ce735a5ad818d1e3a9c9e659 |
|
MD5 | fcbc3d445ef3b4a8656a2628776278d6 |
|
BLAKE2b-256 | 6e32a06ca31faaf627244bad9fc405d9a865f9398dcdc922d3b98c558f7970df |
File details
Details for the file a_pandas_ex_whatsapp_to_df-0.10-py3-none-any.whl
.
File metadata
- Download URL: a_pandas_ex_whatsapp_to_df-0.10-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4250537a1cd492ab63779aab51e531a89a82ed800d41d25fe89a7aa62ba27b0e |
|
MD5 | 5a0ef54549868a816e005ec597ae289d |
|
BLAKE2b-256 | 329087b6ab8b180c804b8471336be797aa2e770c181c3253c3913f368e998798 |