A library/command line utility for parsing Facebook chat history
Project description
Facebook Chat Archive Parser
A small tool and library for parsing chat history from a Facebook data archive into more usable formats.
What is a “Facebook Chat Archive”?
Facebook Messenger records all your conversation history since your account’s creation. There are two options for history retrieval:
Create a scraper that constantly “scrolls up” in the conversation window you’re interested in (or simulates that with API calls), progressively getting more of your chat history.
Ask Facebook for a zip archive of all your data here .
The second option is the only practical way to obtain everything in a timely manner.
What does Facebook give me in this zip archive?
The zip archive contains everything you’ve ever posted to Facebook, including: pictures, videos, posts. etc along with chat messages.
Your chat history comes in a single HTML page titled messages.htm. Unfortunately, the data is unordered and impossible to load into a web browser since it can be hundreds of megabytes. The only way to analyze the content is through parsing the file.
UPDATE: As of October 2017, messages.htm just acts as a manifest for the contents of a directory called messages/. The formatting is almost identical to before but with each thread in its own file now. All files are required to use this tool.
Why would I ever want my Facebook chat history?
Here are some reasons you may want to parse your Facebook chat history:
To make a simulation of your friends using Markov chains.
To keep a record of your conversations when deleting your Facebook account.
To analyze a copy of your conversations for legal reasons.
Here comes the Facebook Chat Archive Parser!
The Facebook Chat Archive Parser is a command line tool (and library for advanced users) used to easily transform your messages.htm file into a useful format.
How do I get it?
Install the Facebook Chat Archive Parser via pip under Python 2.7 or newer:
pip install fbchat-archive-parser
If you already have an older version installed, you can upgrade to the latest with the following command:
pip install --upgrade fbchat-archive-parser
How does it work?
Under the html/ folder simply run the command fbcap in your terminal with your messages.htm file as the argument.
fbcap messages ./messages.htm
And watch as the parser sifts through your data!
When it’s done, your conversation history is dumped to stdout. This can be very long. Here is an example:
What if I want JSON?
Simply supply the -f json option to the command line:
fbcap messages ./messages.htm -f json
Or if you want pretty formatted JSON:
fbcap messages ./messages.htm -f pretty-json
The output format is as follows (messages are ordered from oldest to newest).
{
"threads": [
{
"participants": ["participant_0", "...", "participant_n"],
"messages": [
{
"date": "ISO 8601 formatted date",
"sender": "sender name",
"message": "message text"
},
"..."
]
},
"..."
]
}
How about CSV?
Of course!
fbcap messages ./messages.htm -f csv
thread,sender,date,message
Third User,Third User,2013-10-04T15:05Z,1
Third User,Third User,2013-10-04T15:05Z,2
Third User,Third User,2013-10-04T15:05Z,3
Third User,First User,2013-10-04T15:05Z,4
Third User,Third User,2013-10-04T15:06Z,5
Third User,First User,2013-10-04T15:07Z,6
Third User,First User,2013-10-04T15:07Z,7
Second User,Second User,2013-10-04T15:04Z,X Y Z
Second User,Second User,2013-10-04T15:05Z,X? Y Z!
Second User,Second User,2013-10-04T15:05Z,This is a test
Second User,Second User,2013-10-04T15:05Z,"Yes, it is"
Second User,Second User,2013-10-04T15:05Z,The last message!
"Second User, Third User",Third User,2013-10-04T15:05Z,1
"Second User, Third User",Third User,2013-10-04T15:05Z,2
...
What about YAML?
For sure!
fbcap messages ./messages.htm -f yaml
user: First User
threads:
- participants:
- Second User
- Third User
messages:
- date: 2013-10-04T22:05-07:00
message: '1'
sender: Third User
- date: 2013-10-04T22:05-07:00
message: '2'
sender: Third User
- date: 2013-10-04T22:05-07:00
message: '3'
sender: Third User
...
What if I want to see some statistics?
You can see many statistics regarding your Facebook chat history via the stats subcommand in many different formats.
fbcap stats ./messages.htm -f text
See the --help menu for instructions on how to control what appears in the stats.
$ fbcap stats --help
Usage: fbcap stats [OPTIONS] PATH
Analysis of Facebook chat history.
Options:
-f, --format [json|pretty-json|text|yaml]
Format to output stats as (default: text).
-c, --count-size INTEGER Number of most frequent words to include in
output (-1 for no limit / default 10)
-l, --length INTEGER Number threads to include in the output
[--fmt text only] (-1 for no limit / default
10)
-r, --resolve [BETA] Resolve profile IDs to names by
connecting to Facebook
-p, --noprogress Do not show progress output
-n, --nocolor Do not colorize output
-u, --utc Use UTC timestamps in the output
-z, --timezones TEXT Timezone disambiguators
(TZ=OFFSET,[TZ=OFFSET[...]])
--help Show this message and exit.
How do I get any of the above into a file?
Use standard file redirects.
fbcap messages ./messages.htm > my_file.txt
Can I get each conversation into a separate file?
Use the -d directive to send the output to a directory instead.
fbcap messages ./messages.htm -d some/random/directory
This will create a file per conversation titled thread_#.ext where # is the conversation number and ext is the extension of the format (e.g. json). A manifest.txt file is also created, which lists the participants in each thread number for navigational/search purposes.
What if I only want to parse out a specific conversation?
You can use the -t option to specify a particular conversation/thread you want to output. Just provide a comma-separated set of names. If you don’t remember a last name (or the first name), the system will try to compensate.
fbcap messages ./messages.htm -t second
fbcap messages ./messages.htm -t second,third
What happens to my messages that are pictures?
As of January 2018, Facebook seems to be including referenced images in download archives. Image messages will be converted to text references in the following format: (image reference: messages/photos/<picture id>.jpg)
What else can I do?
Take a look at the help options to find out more!
$ fbcap messages --help
Usage: fbcap messages [OPTIONS] PATH
Conversion of Facebook chat history.
Options:
-f, --format [csv|json|pretty-json|text|yaml]
Format to convert to.
-t, --thread TEXT Only include threads involving exactly the
following comma-separated participants in
output (-t 'Billy,Steve Smith')
-d, --directory PATH Write all output as a file per thread into a
directory (subdirectory will be created)
-r, --resolve [BETA] Resolve profile IDs to names by
connecting to Facebook
-p, --noprogress Do not show progress output
-n, --nocolor Do not colorize output
-u, --utc Use UTC timestamps in the output
-z, --timezones TEXT Timezone disambiguators
(TZ=OFFSET,[TZ=OFFSET[...]])
--help Show this message and exit.
Troubleshooting
Why do some names appear as <some number>@facebook.com?
Facebook seems to randomly swap names for IDs. As of late, this seems to be much less of an issue. Nevertheless, if you are experiencing this issue, the parser can resolve the names via Facebook with the --resolve flag. Keep in mind, this is a beta feature and may not work perfectly.
$ fbcap messages ./messages.htm -t second --resolve
Facebook username/email: facebook_username
Facebook password:
This requires your Facebook credentials to get accurate results. This is a direct connection between your computer and Facebook. Your credentials are not relayed through any servers. Please look at the code if you are feeling paranoid or skeptical :)
Why are some of my chat threads missing?
This is a mysterious issue on Facebook’s end. From anecdotal evidence, it seems that what gets returned in your chat archive is generally conversations with people who you have most recently talked to. Fortunately, it always seems to be the complete history for each conversation and nothing gets truncated.
As of late, it seems like Facebook has fixed this issue on their end and it is now far less of an issue.
Why are repeated names not showing?
Multiple users with equal names in group chats are shown as a single user. This has to do with Facebook’s presentation of names in the message files, which doesn’t make this distinction.
This cannot be remedied unless Facebook fixes the problem.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file fbchat_archive_parser-2.0.0.post2.tar.gz
.
File metadata
- Download URL: fbchat_archive_parser-2.0.0.post2.tar.gz
- Upload date:
- Size: 45.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 562e7dcba3d2e6b04f11076cef7f1967ea7a930a8e4064229dd37ca5a8fdf98e |
|
MD5 | 1df2e2ca4c6f3c751b3572c9f507bdaf |
|
BLAKE2b-256 | 140553f209e8ece03cfa720c8909c99b5e472826158f31fed2d221e4602155b0 |