Skip to main content

Library to parse the Discord GDPR export

Project description

discord_data

Library to parse information from the discord data export, see more info here.

The request to process the data has to be done manually, and it can take a while for them to deliver it to you.

This supports both the old CSV and new JSON formats for messages.

Install:

Requires python3.8+. To install with pip, run:

pip install discord_data

Single Export

This takes the messages and activity directories as arguments, like:

>>> from discord_data import parse_messages, parse_activity
>>> next(parse_messages("./discord/october_2020/messages"))
>>> next(parse_activity("./discord/october_2020/activity"))

Message(mid='747951969171275807', dt=datetime.datetime(2020, 8, 25, 22, 54, 5, 726000, tzinfo=datetime.timezone.utc), channel=Channel(cid='464051583559139340', name='general', server_name='Dream World'), content='<:NotLikeThis:237729324885606403>', attachments='')

Activity(event_id='AQICfXBljgG+pYXCTRrwzy6MqgAAAAA=', event_type='start_listening', region_info=RegionInfo(city='cityNameHere', country_code='US', region_code='CA', time_zone='America/Los_Angeles'), fingerprint=Fingerprint(os='Mac OS X', os_version='16.1.0', browser='Discord Client', ip='216.58.195.78', isp=None, device=None, distro=None), timestamp=datetime.datetime(2016, 11, 26, 7, 8, 47))

Each of these returns a Generator, so they only read from the (giant) JSON files as needed. If you want to process all the data, you can call list on it to consume the whole generator:

from discord_data import parse_messages, parse_activity
msg = list(parse_messages("./discord/october_2020/messages"))
acts = list(parse_activity("./discord/october_2020/activity"))

The raw activity data includes lots of additional fields, this only includes items I thought would be useful. If you want to parse the JSON blobs yourself, you do so by using from discord_data import parse_raw_activity

If you just want to quickly load the parsed data into a REPL:

python3 -m discord_data ./discord/october_2020

That drops you into a python shell with access to activity and messages variables which include the parsed data

Or, to dump it to JSON:

python3 -m discord_data ./discord/october_2020 -o json > discord_data.json

Merge Exports

Exports seem to be complete, but when a server or channel is deleted, all messages in that channel are deleted permanently, so I'd recommend periodically doing an export to make sure you don't lose anything.

I recommend you organize your exports like this:

discord
├── march_2021
│   ├── account
│   ├── activity
│   ├── messages
│   ├── programs
│   ├── README.txt
│   └── servers
└── october_2020
    ├── account
    ├── activity
    ├── messages
    ├── programs
    ├── README.txt
    └── servers

The discord folder at the top would be the export_dir keyword argument to the merge_activity and merge_messages functions, which call the underlying parse functions:

You can choose to supply the arguments with export_dir or paths:

# locates the corresponding `messages` directories in the folder structure
list(merge_messages(export_dir="./discord"))`
# supply a list of the message directories yourself
list(merge_messages(paths=["./discord/march_2021/messages", "./discord/october_2020/messages"]))

If the format for the discord export changes, the parse/merge functions will still work, they just might yield errors as part of their output. To ignore those, you can do:

for msg in merge_messages(export_dir="./discord"):
    if isinstance(msg, Exception):
        logger.warning(msg)
        continue
    # do something with msg
    print(msg.content)

Created to be used as part of HPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

discord_data-0.2.2.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

discord_data-0.2.2-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file discord_data-0.2.2.tar.gz.

File metadata

  • Download URL: discord_data-0.2.2.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for discord_data-0.2.2.tar.gz
Algorithm Hash digest
SHA256 5fd6a03424acbd5c63bcb7f8a9eac0f7de9ecbd7b92c50d5b215d665515c5b96
MD5 98f867518bbbfe6d2bdf0efa1ef7f4d0
BLAKE2b-256 24d26c27854ee6d1d29e9cc296d61a9228f4c84b9f4bc6955e17a3fca0e3b37e

See more details on using hashes here.

File details

Details for the file discord_data-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: discord_data-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for discord_data-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8c55290a2d2168d5b5b600c863085c939a6d80240a351017004ee3fdb8810251
MD5 1fb09cc2231718322cbcf0a8cafa347a
BLAKE2b-256 c54d5e1d7f1979801a5fc4b86d324ec045b57a7f24d3e1035bb60a2e0a7e5359

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page