Library to parse the Discord GDPR export
Project description
discord_data
Library to parse information from the discord data export, see more info here.
The request to process the data has to be done manually, and it can take a while for them to deliver it to you.
This supports both the old CSV and new JSON formats for messages.
Install:
Requires python3.8+
. To install with pip, run:
pip install discord_data
Single Export
This takes the messages
and activity
directories as arguments, like:
>>> from discord_data import parse_messages, parse_activity
>>> next(parse_messages("./discord/october_2020/messages"))
>>> next(parse_activity("./discord/october_2020/activity"))
Message(mid='747951969171275807', dt=datetime.datetime(2020, 8, 25, 22, 54, 5, 726000, tzinfo=datetime.timezone.utc), channel=Channel(cid='464051583559139340', name='general', server_name='Dream World'), content='<:NotLikeThis:237729324885606403>', attachments='')
Activity(event_id='AQICfXBljgG+pYXCTRrwzy6MqgAAAAA=', event_type='start_listening', region_info=RegionInfo(city='cityNameHere', country_code='US', region_code='CA', time_zone='America/Los_Angeles'), fingerprint=Fingerprint(os='Mac OS X', os_version='16.1.0', browser='Discord Client', ip='216.58.195.78', isp=None, device=None, distro=None), timestamp=datetime.datetime(2016, 11, 26, 7, 8, 47))
Each of these returns a Generator
, so they only read from the (giant) JSON files as needed. If you want to process all the data, you can call list
on it to consume the whole generator:
from discord_data import parse_messages, parse_activity
msg = list(parse_messages("./discord/october_2020/messages"))
acts = list(parse_activity("./discord/october_2020/activity"))
The raw activity data includes lots of additional fields, this only includes items I thought would be useful. If you want to parse the JSON blobs yourself, you do so by using from discord_data import parse_raw_activity
If you just want to quickly load the parsed data into a REPL:
python3 -m discord_data ./discord/october_2020
That drops you into a python shell with access to activity
and messages
variables which include the parsed data
Or, to dump it to JSON:
python3 -m discord_data ./discord/october_2020 -o json > discord_data.json
Merge Exports
Exports seem to be complete, but when a server or channel is deleted, all messages in that channel are deleted permanently, so I'd recommend periodically doing an export to make sure you don't lose anything.
I recommend you organize your exports like this:
discord
├── march_2021
│ ├── account
│ ├── activity
│ ├── messages
│ ├── programs
│ ├── README.txt
│ └── servers
└── october_2020
├── account
├── activity
├── messages
├── programs
├── README.txt
└── servers
The discord
folder at the top would be the export_dir
keyword argument to the merge_activity
and merge_messages
functions, which call the underlying parse functions:
You can choose to supply the arguments with export_dir
or paths
:
# locates the corresponding `messages` directories in the folder structure
list(merge_messages(export_dir="./discord"))`
# supply a list of the message directories yourself
list(merge_messages(paths=["./discord/march_2021/messages", "./discord/october_2020/messages"]))
If the format for the discord export changes, the parse/merge functions will still work, they just might yield errors as part of their output. To ignore those, you can do:
for msg in merge_messages(export_dir="./discord"):
if isinstance(msg, Exception):
logger.warning(msg)
continue
# do something with msg
print(msg.content)
Created to be used as part of HPI
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for discord_data-0.2.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c55290a2d2168d5b5b600c863085c939a6d80240a351017004ee3fdb8810251 |
|
MD5 | 1fb09cc2231718322cbcf0a8cafa347a |
|
BLAKE2b-256 | c54d5e1d7f1979801a5fc4b86d324ec045b57a7f24d3e1035bb60a2e0a7e5359 |