Skip to main content

Parser and analytics tools for WhatsApp group chats

Project description

whatstk

Package version

Build Status Python 3.6 Documentation GitHub license

Get the Desktop App

whatstk is a Python module for WhatsApp chat group analysis and distributed under the GPL-3.0 license.

:star: Please star our project if you found it interesting to keep us motivated :smiley:!

Installation

Tested on Python 3.7

pip install whatstk

Getting Started

Make sure to first obtain the chat to be analyzed. Export it as a txt file using your phone (more info on this here).

Check more on how-to use it in the docs

Obtain a dataframe from your chat log file

Load your chat using the object WhatsAppChat. Example below we use chat example.txt

from whatstk import WhatsAppChat

filename = 'chats/example.txt'
chat = WhatsAppChat.from_txt(filename)

Once you have your WhatsAppChat object, you can access the loaded data using the class attribute df, i.e. chat.df.

chat.df.info()
See results

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 18 entries, 2016-08-06 13:23:00 to 2016-10-31 12:23:00
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   username  18 non-null     object
 1   message   18 non-null     object
dtypes: object(2)
memory usage: 432.0+ bytes

Note 1: By default, header auto-detect feature is used. If it does not work, use hformat variable to specify your header format. In our example, it would be: hformat = '%d.%m.%y, %H:%M - %name:'. More on this here.

Note 2: If your chat uses 12h clock, it may not work as expected. If it is your case, please report it in the issues section.

Plot the cumulative messages sent by day

Once you have your WhatsAppChat object, you can easily get the number of interventions per user per, say, day using the method interventions() with date_mode argument set to 'day'. With this, some minor processing, plotly and vis method from whatstk.plot you can get really insightful plots.

from whatstk.analysis import interventions
counts = interventions(chat=chat, date_mode='date', msg_length=False)
counts_cumsum = counts.cumsum()

# Plot result
from plotly.offline import plot
from whatstk.plot import vis
plot(vis(counts_cumsum, 'cumulative number of messages sent per day'))

What's the header?

The chat file syntax can differ between devices, OS and language settings, which makes it hard some times to correctly parse the data and make WhatsTK work correctly.

The header appears for each message sent in the chat. It contains a timestamp and the name of the user that sent the message.

See it for yourself and open the exported chat file. You will find that the messages have a similar format like the one below:

15.04.2016, 15:04 - You created group “Sample Group”
06.08.2016, 13:18 - Messages you send to this group are now secured with end-to-end encryption. Tap for more info.
06.08.2016, 13:23 - Ash Ketchum: Hey guys!
06.08.2016, 13:25 - Brock: Hey Ash, good to have a common group!
06.08.2016, 13:30 - Misty: Hey guys! Long time haven't heard anything from you
06.08.2016, 13:45 - Ash Ketchum: Indeed. I think having a whatsapp group nowadays is a good idea
06.08.2016, 14:30 - Misty: Definetly
06.08.2016, 17:25 - Brock: I totally agree
07.08.2016, 11:45 - Prof. Oak: Kids, shall I design a smart poke-ball?

In this example, the header is day.month.year, hour:minutes - username: which corresponds to the header format %d.%m.%y, %H:%M - %name:. However, in your case it may be something else. Check table below to see the codes for each header unit.

Date Unit Code Definition
%y Year
%m Month of the year (1-12)
%d Day of the month (0-31)
%H Hour 24h-clock (0-23)
%P Hour 12h-clock (1-12)
%M Minutes (0-60)
%S Seconds (0-60)
%name Name of user

Contribute

We are very open to have collaborators. You can freely fork and issue a pull request with your updates! For other issues/bugs/suggestions, please report it as an issue or text me.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whatstk-0.2.4.tar.gz (25.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page