Skip to main content

Use the power of pandas to search through your WhatsApp messages (Doesn't require root access!)

Project description

Finding certain messages in your WhatsApp message is sometimes hard, but Pandas makes it easier. Doesn't require root access!

Tested with Google Pixel 6 / Windows 10 / Python 3.9.13
pip install a-pandas-ex-whatsapp-to-df

Follow these steps if you want to avoid rooting your cell phone:

  • Open WhatsApp on your Android device.

  • Tap the three-dot overflow menu button and navigate to "Settings > Chats > Chat backup".

  • Tap End-to-end encrypted backup, then tap "Turn on".

  • Create a 64-digit key and save it! You will need it! DON'T CREATE a password!

  • Tap Create and wait for WhatsApp to create an encrypted backup.

  • Copy the backup from your Android device to your computer (File path on my device: storage/emulated/0/Android/media/com.whatsapp/WhatsApp/Databases/msgstore.db.crypt15 - You can localize the file using https://pypi.org/project/a-pandas-ex-adb-to-df )

from a_pandas_ex_whatsapp_to_df import pd_add_whatsapp_to_df, decrypt_file

import pandas as pd

pd_add_whatsapp_to_df()



encrypteddb = r"F:\w32\Databases1\msgstore.db.crypt15"

decrypteddb = r"F:\newtest\msgstorexxxxxxxxxxxxxxxxx.db"

decryptkey = "THE 64-DIGIT KEY THAT YOU HAVE CREATED"





# this function will decrypt your WhatsApp database

# using https://github.com/ElDavoo/WhatsApp-Crypt14-Crypt15-Decrypter

# if the folder  WhatsApp-Crypt14-Crypt15-Decrypter doesn't exist, 

# the script will execute 

# "git clone https://github.com/ElDavoo/WhatsApp-Crypt14-Crypt15-Decrypter.git" 

# which means git must be installed and added to the path variable.

decrypt_file(

    encrypteddb, decrypteddb, decryptkey,

) 







df = pd.Q_whatsapp_to_df(sql_database=decrypteddb)



#output:

[I] Crypt15 / Raw key loaded

[I] Database header parsed

[I] Done

message_template 59

message_template_button 56

message_location 85

message_quoted_location 16

message_mentions 5377

message_media 99775

message_vcard 561

message_vcard_jid 529

message_streaming_sidecar 370

message_quoted_media 18289

message_quoted 114225

message_quoted_mentions 3088

message_thumbnail 60438

message_link 23190

message_quoted_vcard 73

message_text 12181

message_quoted_text 2077

message_send_count 56992

receipt_device 78638

message_system 21142

message_system_group 13739

message_system_value_change 2416

message_system_number_change 1660

message_system_photo_change 48

message_system_chat_participant 8134

receipt_user 359648

message_revoked 5151

messages_hydrated_four_row_template 59

message_system_block_contact 1

message_ephemeral_setting 33

message_view_once_media 13

mms_thumbnail_metadata 74

message_system_initial_privacy_provider 3304

message_privacy_state 15

message_system_business_state 18

message_ephemeral 37

played_self_receipt 204

message_system_linked_group_call 10

audio_data 6809

...





#You should get a DataFrame containing the following columns:



['_id_x',

 'chat_row_id',

 'from_me',

 'key_id',

 'sender_jid_row_id',

 'status',

 'broadcast',

 'recipient_count',

 'participant_hash',

 'origination_flags',

 'origin',

 'timestamp',

 'received_timestamp',

 'receipt_server_timestamp',

 'message_type',

 'text_data',

 'starred',

 'lookup_tables',

 'sort_id',

 'message_add_on_flags',

 '_id_y',

 'key_remote_jid',

 'remote_resource',

 'receipt_device_timestamp',

 'read_device_timestamp',

 'played_device_timestamp',

 'phone_number',

 'user_id',

 '_id',

 'user',

 'server',

 'agent',

 'type',

 'raw_string',

 'device',

 'group',

 'message_row_id',

 'video_call',

 'group_jid_row_id',

 'is_joinable_group_call',

 'conversation__id',

 'conversation_jid_row_id',

 'conversation_hidden',

 'conversation_subject',

 'conversation_created_timestamp',

 'conversation_display_message_row_id',

 'conversation_last_message_row_id',

 'conversation_last_read_message_row_id',

 'conversation_last_read_receipt_sent_message_row_id',

 'conversation_last_important_message_row_id',

 'conversation_archived',

 'conversation_sort_timestamp',

 'conversation_mod_tag',

 'conversation_gen',

 'conversation_spam_detection',

 'conversation_unseen_earliest_message_received_time',

 'conversation_unseen_message_count',

 'conversation_unseen_missed_calls_count',

 'conversation_unseen_row_count',

 'conversation_plaintext_disabled',

 'conversation_vcard_ui_dismissed',

 'conversation_change_number_notified_message_row_id',

 'conversation_show_group_description',

 'conversation_ephemeral_expiration',

 'conversation_last_read_ephemeral_message_row_id',

 'conversation_ephemeral_setting_timestamp',

 'conversation_unseen_important_message_count',

 'conversation_ephemeral_disappearing_messages_initiator',

 'conversation_group_type',

 'conversation_last_message_reaction_row_id',

 'conversation_last_seen_message_reaction_row_id',

 'conversation_unseen_message_reaction_count',

 'conversation_growth_lock_level',

 'conversation_growth_lock_expiration_ts',

 'conversation_last_read_message_sort_id',

 'conversation_display_message_sort_id',

 'conversation_last_message_sort_id',

 'conversation_last_read_receipt_sent_message_sort_id',

 'message_forwarded',

 'message_template_message_row_id',

 'message_template_content_text_data',

 'message_template_footer_text_data',

 'message_template_template_id',

 'message_template_csat_trigger_expiration_ts',

 'message_template_button__id',

 'message_template_button_message_row_id',

 'message_template_button_text_data',

 'message_template_button_extra_data',

 'message_template_button_button_type',

 'message_template_button_used',

 'message_template_button_selected_index',

 'message_template_button_otp_button_type',

 'message_location_message_row_id',

 'message_location_chat_row_id',

 'message_location_latitude',

 'message_location_longitude',

 'message_location_place_name',

 'message_location_place_address',

 'message_location_url',

 'message_location_live_location_share_duration',

 'message_location_live_location_sequence_number',

 'message_location_live_location_final_latitude',

 'message_location_live_location_final_longitude',

 'message_location_live_location_final_timestamp',

 'message_location_map_download_status',

 'message_quoted_location_message_row_id',

 'message_quoted_location_latitude',

 'message_quoted_location_longitude',

 'message_quoted_location_place_name',

 'message_quoted_location_place_address',

 'message_quoted_location_url',

 'message_quoted_location_thumbnail',

 'message_mentions__id',

 'message_mentions_message_row_id',

 'message_mentions_jid_row_id',

 'message_media_message_row_id',

 'message_media_chat_row_id',

 'message_media_autotransfer_retry_enabled',

 'message_media_multicast_id',

 'message_media_media_job_uuid',

 'message_media_transferred',

 'message_media_transcoded',

 'message_media_file_path',

 'message_media_file_size',

 'message_media_suspicious_content',

 'message_media_trim_from',

 'message_media_trim_to',

 'message_media_face_x',

 'message_media_face_y',

 'message_media_media_key',

 'message_media_media_key_timestamp',

 'message_media_width',

 'message_media_height',

 'message_media_has_streaming_sidecar',

 'message_media_gif_attribution',

 'message_media_thumbnail_height_width_ratio',

 'message_media_direct_path',

 'message_media_first_scan_sidecar',

 'message_media_first_scan_length',

 'message_media_message_url',

 'message_media_mime_type',

 'message_media_file_length',

 'message_media_media_name',

 'message_media_file_hash',

 'message_media_media_duration',

 'message_media_page_count',

 'message_media_enc_file_hash',

 'message_media_partial_media_hash',

 'message_media_partial_media_enc_hash',

 'message_media_is_animated_sticker',

 'message_media_original_file_hash',

 'message_media_mute_video',

 'message_vcard__id',

 'message_vcard_message_row_id',

 'message_vcard_vcard',

 'message_vcard_jid__id',

 'message_vcard_jid_vcard_jid_row_id',

 'message_vcard_jid_vcard_row_id',

 'message_vcard_jid_message_row_id',

 'message_streaming_sidecar_message_row_id',

 'message_streaming_sidecar_sidecar',

 'message_streaming_sidecar_chunk_lengths',

 'message_streaming_sidecar_timestamp',

 'message_quoted_media_message_row_id',

 'message_quoted_media_media_job_uuid',

 'message_quoted_media_transferred',

 'message_quoted_media_file_path',

 'message_quoted_media_file_size',

 'message_quoted_media_media_key',

 'message_quoted_media_media_key_timestamp',

 'message_quoted_media_width',

 'message_quoted_media_height',

 'message_quoted_media_direct_path',

 'message_quoted_media_message_url',

 'message_quoted_media_mime_type',

 'message_quoted_media_file_length',

 'message_quoted_media_media_name',

 'message_quoted_media_file_hash',

 'message_quoted_media_media_duration',

 'message_quoted_media_page_count',

 'message_quoted_media_enc_file_hash',

 'message_quoted_media_thumbnail',

 'message_quoted_message_row_id',

 'message_quoted_chat_row_id',

 'message_quoted_parent_message_chat_row_id',

 'message_quoted_from_me',

 'message_quoted_sender_jid_row_id',

 'message_quoted_key_id',

 'message_quoted_timestamp',

 'message_quoted_message_type',

 'message_quoted_text_data',

 'message_quoted_payment_transaction_id',

 'message_quoted_lookup_tables',

 'message_quoted_origin',

 'message_quoted_mentions__id',

 'message_quoted_mentions_message_row_id',

 'message_quoted_mentions_jid_row_id',

 'message_thumbnail_message_row_id',

 'message_thumbnail_thumbnail',

 'message_link__id',

 'message_link_chat_row_id',

 'message_link_message_row_id',

 'message_link_link_index',

 'message_quoted_vcard__id',

 'message_quoted_vcard_message_row_id',

 'message_quoted_vcard_vcard',

 'message_text_message_row_id',

 'message_text_description',

 'message_text_page_title',

 'message_text_url',

 'message_text_font_style',

 'message_text_text_color',

 'message_text_background_color',

 'message_text_preview_type',

 'message_text_invite_link_group_type',

 'message_quoted_text_message_row_id',

 'message_quoted_text_thumbnail',

 'message_send_count_message_row_id',

 'message_send_count_send_count',

 'receipt_device__id',

 'receipt_device_message_row_id',

 'receipt_device_receipt_device_jid_row_id',

 'receipt_device_receipt_device_timestamp',

 'receipt_device_primary_device_version',

 'message_system_message_row_id',

 'message_system_action_type',

 'message_system_group_message_row_id',

 'message_system_group_is_me_joined',

 'message_system_value_change_message_row_id',

 'message_system_value_change_old_data',

 'message_system_number_change_message_row_id',

 'message_system_number_change_old_jid_row_id',

 'message_system_number_change_new_jid_row_id',

 'message_system_photo_change_message_row_id',

 'message_system_photo_change_new_photo_id',

 'message_system_photo_change_old_photo',

 'message_system_photo_change_new_photo',

 'message_system_chat_participant_message_row_id',

 'message_system_chat_participant_user_jid_row_id',

 'receipt_user__id',

 'receipt_user_message_row_id',

 'receipt_user_receipt_user_jid_row_id',

 'receipt_user_receipt_timestamp',

 'receipt_user_read_timestamp',

 'receipt_user_played_timestamp',

 'message_revoked_message_row_id',

 'message_revoked_revoked_key_id',

 'message_revoked_admin_jid_row_id',

 'messages_hydrated_four_row_template_message_row_id',

 'messages_hydrated_four_row_template_message_template_id',

 'message_ephemeral_setting_message_row_id',

 'message_ephemeral_setting_setting_duration',

 'message_ephemeral_setting_setting_reason',

 'message_ephemeral_setting_user_jid_row_id_csv',

 'message_view_once_media_message_row_id',

 'message_view_once_media_state',

 'mms_thumbnail_metadata_message_row_id',

 'mms_thumbnail_metadata_direct_path',

 'mms_thumbnail_metadata_media_key',

 'mms_thumbnail_metadata_media_key_timestamp',

 'mms_thumbnail_metadata_enc_thumb_hash',

 'mms_thumbnail_metadata_thumb_hash',

 'mms_thumbnail_metadata_thumb_width',

 'mms_thumbnail_metadata_thumb_height',

 'mms_thumbnail_metadata_transferred',

 'mms_thumbnail_metadata_micro_thumbnail',

 'mms_thumbnail_metadata_insert_timestamp',

 'message_system_initial_privacy_provider_message_row_id',

 'message_system_initial_privacy_provider_privacy_provider',

 'message_system_initial_privacy_provider_verified_biz_name',

 'message_system_initial_privacy_provider_biz_state_id',

 'message_privacy_state_message_row_id',

 'message_privacy_state_host_storage',

 'message_privacy_state_actual_actors',

 'message_privacy_state_privacy_mode_ts',

 'message_privacy_state_business_name',

 'message_system_business_state_message_row_id',

 'message_system_business_state_privacy_message_type',

 'message_system_business_state_business_name',

 'message_ephemeral_message_row_id',

 'message_ephemeral_duration',

 'message_ephemeral_expire_timestamp',

 'message_ephemeral_keep_in_chat',

 'played_self_receipt_message_row_id',

 'played_self_receipt_to_jid_row_id',

 'played_self_receipt_participant_jid_row_id',

 'played_self_receipt_message_id',

 'message_system_linked_group_call_message_row_id',

 'message_system_linked_group_call_call_id',

 'message_system_linked_group_call_is_video_call',

 'audio_data_message_row_id',

 'audio_data_waveform']





         _id_x  ...                                audio_data_waveform

0         2101  ...                                                NaN

1       517004  ...                                                NaN

2       441187  ...                                                NaN

3       441119  ...                                                NaN

4       441115  ...                                                NaN

        ...  ...                                                ...

866976  803337  ...                                                NaN

866977  803333  ...  b'\x00\x00F\x00\x00\x14%\x1c\x004\x00\x00\r\x1...

866978  803332  ...                                                NaN

866979  803331  ...  b'\x00\x00\x00\x008\x06&;*\x00\x00\x16\x1178)\...

866980  803330  ...                                                NaN

[866981 rows x 292 columns]

If this is too much information, you can exclude tables from being parsed

Parameters:

	sql_database: str

		The file path to your decrypted SQL Database

	databases_to_add: tuple

		The SQL tables to include in the output DataFrame

		default = (

		"message_template",

		"message_template_button",

		"message_location",

		"message_quoted_location",

		"message_mentions",

		"message_media",

		"message_vcard",

		"message_vcard_jid",

		"message_streaming_sidecar",

		"message_quoted_media",

		"message_quoted",

		"message_quoted_mentions",

		"message_thumbnail",

		"message_link",

		"message_quoted_vcard",

		"message_text",

		"message_quoted_text",

		"message_send_count",

		"receipt_device",

		"message_system",

		"message_system_group",

		"message_system_value_change",

		"message_system_number_change",

		"message_system_photo_change",

		"message_system_chat_participant",

		"receipt_user",

		"message_revoked",

		"messages_hydrated_four_row_template",

		"message_system_block_contact",

		"message_ephemeral_setting",

		"message_view_once_media",

		"mms_thumbnail_metadata",

		"message_system_initial_privacy_provider",

		"message_privacy_state",

		"message_system_business_state",

		"message_ephemeral",

		"played_self_receipt",

		"message_system_linked_group_call",

		"audio_data",

		)

	optimize_dtypes:bool

		Optimize dtypes at the end of the conversion to save memory

		default = True

Returns

	df: pd.DataFrame

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

a_pandas_ex_whatsapp_to_df-0.10.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

a_pandas_ex_whatsapp_to_df-0.10-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file a_pandas_ex_whatsapp_to_df-0.10.tar.gz.

File metadata

File hashes

Hashes for a_pandas_ex_whatsapp_to_df-0.10.tar.gz
Algorithm Hash digest
SHA256 e53edcda8bebf0ba70ea87f53890a71b2d3c7948ce735a5ad818d1e3a9c9e659
MD5 fcbc3d445ef3b4a8656a2628776278d6
BLAKE2b-256 6e32a06ca31faaf627244bad9fc405d9a865f9398dcdc922d3b98c558f7970df

See more details on using hashes here.

File details

Details for the file a_pandas_ex_whatsapp_to_df-0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for a_pandas_ex_whatsapp_to_df-0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 4250537a1cd492ab63779aab51e531a89a82ed800d41d25fe89a7aa62ba27b0e
MD5 5a0ef54549868a816e005ec597ae289d
BLAKE2b-256 329087b6ab8b180c804b8471336be797aa2e770c181c3253c3913f368e998798

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page