Skip to main content

CERN Data Handling Framework is a small framework to work with the CERN Anonymized Mattermost Data set

Project description

CERN Data Handling Framework

DOI

Introduction ☕️

Dataset Description

Mattermost is an open-source communication platform similar to slack that is widely used at CERN. The CERN Anonymized Mattermost Dataset includes Mattermost data from January 2018 to November 2021 with 20794 CERN users, 2367 Mattermost teams, 12773 Mattermost channels, 151 CERN buildings, and 163 CERN organizational units. The data set states the relationship between Mattermost teams, Mattermost channels, and CERN users, and holds various information such as channel creation, channel deletion times, user channel joining and leave times, and user-specific information such as building and organizational units. To hide identifiable information (e.g. Team Name, User Name, Channel Name, etc.), the dataset was anonymized. The anonymization was done by omitting some attributes, hashing string values, and removing connections between users/teams/channels.

Dataset License: CC BY-NC Creative Commons Attribution Non-Commercial Licence

Dataset Link: CERN Anonymized Mattermost Data | Zenodo

@dataset{jakovljevic_igor_2022_6319684,
  author       = {Jakovljevic, Igor and
                  Wagner, Andreas and
                  Gütl, Christian and
                  Pobaschnig, Martin and
                  Mönnich, Adrian},
  title        = {CERN Anonymized Mattermost Data},
  month        = mar,
  year         = 2022,
  publisher    = {Zenodo},
  version      = 1,
  doi          = {10.5281/zenodo.6319684},
  url          = {https://doi.org/10.5281/zenodo.6319684}
}

Getting Started 🏁

Setup Repository 💻


1. Clone the repository


$ git clone https://github.com/mpobaschnig/cdhf

2. Retrieving the Dataset


Retrieve Mattermost Data (mmdata.json) from Zenodo. To retrieve the dataset execute:

$ bash cdhf/init.sh

Or, you can manually create the input/ directory in the root folder, then download the mmdata.json into the input directory.

3. Jupyter Notebook


Create the jupyter notebook (undefined.ipyb) file in the root level directory.

4. Conclusion


In the end it should look like this:

.
├── cdhf
│   ├── __init__.py
│   ├── init.sh
│   ├── LICENSE
│   ├── README.md
│   └── src
│       └── mattermost
│           ├── channel_member_history_entry.py
│           ├── channel_member.py
│           ├── channel.py
│           ├── data.py
│           ├── __init__.py
│           ├── team_member.py
│           ├── team.py
│           └── user_data.py
├── input
│   └── mmdata.json
└── undefined.ipynb

Working with the Framework and Jupyter Notebooks 💻


Then include this file in the notebook from the root level

from cdhf import Data

Create the Data object to work with the data set:

data = Data("path/to/Mattermost/JSON/file")

data.load_all()

print(len(data.teams))

Documentation 🖨️

API documentation is available at https://mpobaschnig.github.io/cdhf/.


Citation ✍️

If you happen to mention or use this project as part of one of your scientific works, please cite the following paper:

  • Jakovljevic, I., Pobaschnig, M., Gütl, C. and Wagner, A., 2022. Privacy Aware Identification of User Clusters in Large Organisations based on Anonymized Mattermost User and Channel Information. In: DATA ANALYTICS 2021, The Tenth International Conference on Data Analytics.
@inproceedings{DataAnalytics2022,
  author  = { Jakovljevic, I., Pobaschnig, M., Gütl, C. AND Wagner, A. },
  year    = { 2022 },
  month   = { 11 },
  title   = { Privacy Aware Identification of User Clusters in Large Organisations based on Anonymized Mattermost User and Channel Information }
}

Latest publications 📚

  • Jakovljevic, I., Gütl, C., Wagner, A. and Nussbaumer, A. Compiling Open Datasets in Context of Large Organizations while Protecting User Privacy and Guaranteeing Plausible Deniability. In Proceedings of the 11th International Conference on Data Science, Technology and Applications (DATA 2022)
@article{Data2022,
    author  = { Jakovljevic, I., Gütl, C., Wagner, A. and Nussbaumer, A. },
    title   = { Compiling Open Datasets in Context of Large Organizations while Protecting User Privacy and Guaranteeing Plausible Deniability },
    journal = { In Proceedings of the 11th International Conference on Data Science, Technology and Applications (DATA 2022) },
    year    = { 2022 }
}

Involved institutions 🏫

Contributors from the following institutions were involved in the development of this project:


Visual Exploration & Analysis 👁️‍🗨️

In case you would like to visually explore the CERN Mattermost dataset without any programming you can use Collaboration Spotting X.

It is a web-based visual network analytics application which includes various convenient features which enable exploration of network datasets on the fly.

To get started with exploring the CERN Mattermost dataset read the instructions of CSX.


Acknowledgements 🙏

We would like to express our gratitude to CERN, for allowing us to publish the dataset as open data and use it for research purposes.

Project details


Release history Release notifications | RSS feed

This version

1.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdhf-1.1.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

cdhf-1.1-py3-none-any.whl (14.0 kB view details)

Uploaded Python 3

File details

Details for the file cdhf-1.1.tar.gz.

File metadata

  • Download URL: cdhf-1.1.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for cdhf-1.1.tar.gz
Algorithm Hash digest
SHA256 c1ea4341fd7c9f94c5fe5a821d52e336826a26753e9f10b5209ffed9955798d1
MD5 ec2cd99ac225e43b278b5bf4f088624e
BLAKE2b-256 e08e1080feecc5e68530e5f5fa16ea7f72f7b903f0c51855da2e87a201fb3bbc

See more details on using hashes here.

File details

Details for the file cdhf-1.1-py3-none-any.whl.

File metadata

  • Download URL: cdhf-1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.4

File hashes

Hashes for cdhf-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5f7dd98d0983ec21dd5419acebb70ec556d37f1af280cd323d79158f8292ced7
MD5 e17ee081adf160fc72e8a457f60400f2
BLAKE2b-256 f1c66d4387d3e392345825f316eec078fa116896e565cc9d973c816c93d90d86

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page