Parser and analytics tools for WhatsApp group chats
Project description
whatstk: python whatsapp parser and analysis tool
whatstk is a Python module for WhatsApp chat group analysis and distributed under the GPL-3.0 license.
:star: Please star our project if you found it interesting to keep us motivated :smiley:!
Content
Installation
pip install whatstk
Getting Started
Make sure to first obtain the chat to be analyzed. Export it as a txt
file using your phone (more info on this here).
Check more on how-to use it in the docs
Obtain a dataframe from your chat log file
Load your chat using the object WhatsAppChat
. Example below we use chat example.txt
from whatstk import WhatsAppChat
filename = 'chats/example.txt'
chat = WhatsAppChat.from_txt(filename)
Once you have your WhatsAppChat
object, you can access the loaded data using the class attribute df
, i.e. chat.df
.
chat.df.info()
See results
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 18 entries, 2016-08-06 13:23:00 to 2016-10-31 12:23:00 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 username 18 non-null object 1 message 18 non-null object dtypes: object(2) memory usage: 432.0+ bytes
Note 1: By default, header auto-detect feature is used. If it does not work, use hformat variable to specify your
header format. In our example, it would be: hformat = '%d.%m.%y, %H:%M - %name:'
. More on this here.
Note 2: If your chat uses 12h clock, it may not work as expected. If it is your case, please report it in the issues section.
Plot the cumulative messages sent by day
Once you have your WhatsAppChat
object, you can easily get the number of interventions per user per, say, day
using
the method interventions()
with date_mode
argument set to 'day'
. With this, some minor processing, plotly
and vis
method from whatstk.plot
you can get really insightful plots.
from whatstk.analysis import interventions
counts = interventions(chat=chat, date_mode='date', msg_length=False)
counts_cumsum = counts.cumsum()
# Plot result
from plotly.offline import plot
from whatstk.plot import vis
plot(vis(counts_cumsum, 'cumulative number of messages sent per day'))
What's the header?
The chat file syntax can differ between devices, OS and language settings, which makes it hard some times to correctly parse the data and make WhatsTK work correctly.
The header appears for each message sent in the chat. It contains a timestamp and the name of the user that sent the message.
See it for yourself and open the exported chat file. You will find that the messages have a similar format like the one below:
15.04.2016, 15:04 - You created group “Sample Group”
06.08.2016, 13:18 - Messages you send to this group are now secured with end-to-end encryption. Tap for more info.
06.08.2016, 13:23 - Ash Ketchum: Hey guys!
06.08.2016, 13:25 - Brock: Hey Ash, good to have a common group!
06.08.2016, 13:30 - Misty: Hey guys! Long time haven't heard anything from you
06.08.2016, 13:45 - Ash Ketchum: Indeed. I think having a whatsapp group nowadays is a good idea
06.08.2016, 14:30 - Misty: Definetly
06.08.2016, 17:25 - Brock: I totally agree
07.08.2016, 11:45 - Prof. Oak: Kids, shall I design a smart poke-ball?
In this example, the header is day.month.year, hour:minutes - username:
which corresponds to the header format
%d.%m.%y, %H:%M - %name:
. However, in your case it may be something else. Check table below to see the codes for each
header unit.
Date Unit Code | Definition |
---|---|
%y | Year |
%m | Month of the year (1-12) |
%d | Day of the month (0-31) |
%H | Hour 24h-clock (0-23) |
%P | Hour 12h-clock (1-12) |
%M | Minutes (0-60) |
%S | Seconds (0-60) |
%name | Name of user |
Contributing
We are very open to have collaborators. You can freely fork and issue a pull request with your updates! For other issues/bugs/suggestions, please report it in the issues section.
Pull Requests
Make sure to test your code before issuing a pull request:
py.test --cov-report term --cov=whatstk tests/
Note 1: Use --html=testreport.html --cov-report html
to generate HTML reports.
However, pull requests will trigger the Travis CI pipeline, which will run the tests as well.
Chat with us
Join our Discord group.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.