Skip to main content

Library to parse the Discord GDPR export

Project description

discord_data

Library to parse information from the discord data export, see more info here.

The request to process the data has to be done manually, and it can take a while for them to deliver it to you.

This supports both the old CSV and new JSON formats for messages.

Install:

Requires python3.9+. To install with pip, run:

pip install discord_data

Single Export

This takes the messages and activity directories as arguments, like:

>>> from discord_data import parse_messages, parse_activity
>>> next(parse_messages("./discord/october_2020/messages"))
>>> next(parse_activity("./discord/october_2020/activity"))

Message(mid='747951969171275807', dt=datetime.datetime(2020, 8, 25, 22, 54, 5, 726000, tzinfo=datetime.timezone.utc), channel=Channel(cid='464051583559139340', name='general', server_name='Dream World'), content='<:NotLikeThis:237729324885606403>', attachments='')

Activity(event_id='AQICfXBljgG+pYXCTRrwzy6MqgAAAAA=', event_type='start_listening', region_info=RegionInfo(city='cityNameHere', country_code='US', region_code='CA', time_zone='America/Los_Angeles'), fingerprint=Fingerprint(os='Mac OS X', os_version='16.1.0', browser='Discord Client', ip='216.58.195.78', isp=None, device=None, distro=None), timestamp=datetime.datetime(2016, 11, 26, 7, 8, 47))

Each of these returns a Generator, so they only read from the (giant) JSON files as needed. If you want to process all the data, you can call list on it to consume the whole generator:

from discord_data import parse_messages, parse_activity
msg = list(parse_messages("./discord/october_2020/messages"))
acts = list(parse_activity("./discord/october_2020/activity"))

The raw activity data includes lots of additional fields, this only includes items I thought would be useful. If you want to parse the JSON blobs yourself, you do so by using from discord_data import parse_raw_activity

If you just want to quickly load the parsed data into a REPL:

python3 -m discord_data ./discord/october_2020

That drops you into a python shell with access to activity and messages variables which include the parsed data

Or, to dump it to JSON:

python3 -m discord_data ./discord/october_2020 -o json > discord_data.json

Merge Exports

Exports seem to be complete, but when a server or channel is deleted, all messages in that channel are deleted permanently, so I'd recommend periodically doing an export to make sure you don't lose anything.

I recommend you organize your exports like this:

discord
├── march_2021
│   ├── account
│   ├── activity
│   ├── messages
│   ├── programs
│   ├── README.txt
│   └── servers
└── october_2020
    ├── account
    ├── activity
    ├── messages
    ├── programs
    ├── README.txt
    └── servers

The discord folder at the top would be the export_dir keyword argument to the merge_activity and merge_messages functions, which call the underlying parse functions:

You can choose to supply the arguments with export_dir or paths:

# locates the corresponding `messages` directories in the folder structure
list(merge_messages(export_dir="./discord"))`
# supply a list of the message directories yourself
list(merge_messages(paths=["./discord/march_2021/messages", "./discord/october_2020/messages"]))

If the format for the discord export changes, the parse/merge functions will still work, they just might yield errors as part of their output. To ignore those, you can do:

for msg in merge_messages(export_dir="./discord"):
    if isinstance(msg, Exception):
        logger.warning(msg)
        continue
    # do something with msg
    print(msg.content)

Created to be used as part of HPI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

discord_data-0.2.3.tar.gz (11.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

discord_data-0.2.3-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file discord_data-0.2.3.tar.gz.

File metadata

  • Download URL: discord_data-0.2.3.tar.gz
  • Upload date:
  • Size: 11.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for discord_data-0.2.3.tar.gz
Algorithm Hash digest
SHA256 84be32db936b5fa75a98c98c4925317a098c00849c3ecf0a5840e4c16480659d
MD5 032d04a06d679e7dac05253233c95954
BLAKE2b-256 b416cd0c63562540d4aea28e93f16776336a03b243d4cb8e1c984155def16ddd

See more details on using hashes here.

File details

Details for the file discord_data-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: discord_data-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for discord_data-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e9f08dff202f182617c0d100c384078f1db9378cf702f47b737e5f994a5069cf
MD5 1fc61cf47e4447eae1dfd195bb11f64f
BLAKE2b-256 4db1752c06fdc987c5f483462ea0defdab0afae1d3440db8d9cdbff2317f696d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page