A Python Library for Social Event Detection
Project description
SocialED
A Python Library for Social Event Detection
The field of Social Event Detection represents a pivotal area of research within the broader domains of artificial intelligence and natural language processing. Its objective is the automated identification and analysis of events from social media platforms such as Twitter and Facebook. Such events encompass a wide range of occurrences, including natural disasters and viral phenomena.
To address this gap, we present Social Event Detection Python library called SocialED, an open-source Python library designed to facilitate the development and evaluation of social event detection algorithms.
SocialED stands out for:
- Broad spectrum of over 10 social event detection algorithms, including classic techniques like Latent Dirichlet Allocation (LDA) and modern deep learning models such as BiLSTM, Word2Vec, GloVe, and more.
- Unified APIs, comprehensive documentation, and practical examples that enable users to format their data consistently, ensuring smooth integration with all social event detectors within SocialED.
- Customizable and modular components that empower users to tailor detection algorithms to meet specific requirements, facilitating the setup of social event detection workflows.
- Rich utility functions that streamline the process of building and executing social event detection tasks.
- Reliable implementation featuring unit tests, cross-platform continuous integration, as well as code coverage and maintainability assessments.
SocialED includes 10+ graph outlier detection algorithms.
For consistency and accessibility, SocialED is developed on top of DGL <https://www.dgl.ai/>_
and PyTorch <https://pytorch.org/>, and follows the API design of PyOD <https://github.com/yzhao062/pyod>
and PyGOD <https://github.com/pygod-team/pygod>_.
See examples below for detecting outliers with SocialED in 5 lines!
Folder Structure
::
.
├── LICENSE
├── MANIFEST.in
├── README.rst
├── docs
├── SocialED
│ ├── init.py
│ ├── datasets
│ ├── detector
│ └── metrics
├── requirements.txt
├── setup.cfg
└── setup.py
Installation
It is recommended to use pip for installation.
Please make sure the latest version is installed, as PyGOD is updated frequently:
.. code-block:: bash
pip install SocialED # normal install pip install --upgrade SocialED # or update if needed
Alternatively, you could clone and run setup.py file:
.. code-block:: bash
# Set up the environment
conda create -n SocialED python=3.8
conda activate SocialED
# Installation
git clone https://github.com/RingBDStack/SocialED.git
cd SocialED
pip install -r requirements.txt
pip install .
Required Dependencies:
- python>=3.8
- numpy>=1.24.3
- scikit-learn>=1.2.2
- scipy>=1.10.1
- networkx>=2.3
- torch>=2.3.0
- torch_geometric>=2.5.3
- dgl>=0.6.0
API Cheatsheet & Reference ^^^^^^^^^^^^^^^^^^^^^^^^^^
Full API Reference: (https://socialed.readthedocs.io). API cheatsheet for all detectors:
- preprocess(): Preprocess the dataset.
- fit(): Fit the detector with train data.
- detector(): Initialize and configure the detection model, preparing it for training and prediction tasks.
- evaluate(predictions, groundtruth): Assess the performance of the detector by comparing predictions with the actual data.
Usage & Example
.. code-block:: python
from SocialED.detector import KPGNN from SocialED.data import Event2012_Dataset
Load the dataset using the Event2012_Dataset class
dataset = Event2012_Dataset.load_data()
Create an instance of the KPGNN class and loaded dataset
model = KPGNN(dataset)
Run the KPGNN instance
model.preprocess() model = model.fit() model.detection()
Collected Algorithms
10+ different methods in total are implemented in this library. We provide an overview of their characteristics as follows.
Algorithm Descriptions
- LDA: Latent Dirichlet Allocation (LDA) is a generative statistical model that allows sets of observations to be explained by unobserved groups. It is particularly useful for discovering the hidden thematic structure in large text corpora.
- BiLSTM: Bi-directional Long Short-Term Memory (BiLSTM) networks enhance the capabilities of traditional LSTMs by processing sequences in both forward and backward directions. This bidirectional approach is effective for tasks like sequence classification and time series prediction.
- Word2Vec: Word2Vec is a family of models that generate word embeddings by training shallow neural networks to predict the context of words. These embeddings capture semantic relationships between words, making them useful for various natural language processing tasks.
- GLOVE: Global Vectors for Word Representation (GLOVE) generates word embeddings by aggregating global word-word co-occurrence statistics from a corpus. This approach produces vectors that capture meaning effectively, based on the frequency of word pairs in the training text.
- WMD: Word Mover's Distance (WMD) measures the semantic distance between two documents by computing the minimum distance that words from one document need to travel to match words from another document. This method is grounded in the concept of word embeddings.
- BERT: Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based model that pre-trains deep bidirectional representations by conditioning on both left and right context in all layers. BERT has achieved state-of-the-art results in many NLP tasks.
- SBERT: Sentence-BERT (SBERT) modifies the BERT network to generate semantically meaningful sentence embeddings that can be compared using cosine similarity. It is particularly useful for sentence clustering and semantic search.
- EventX: EventX is designed for online event detection in social media streams, processing tweets in real-time to identify emerging events by clustering similar content. This framework is optimized for high-speed data environments.
- CLKD: Cross-lingual Knowledge Distillation (CLKD) combines a convolutional neural network with dynamic time warping to align sequences and detect events in streaming data. This online algorithm is effective for real-time applications.
- MVGAN: Multi-View Graph Attention Network (MVGAN) leverages multiple data views to enhance event detection accuracy. This offline algorithm uses GANs to model complex data distributions, improving robustness against noise and incomplete data.
- KPGNN: Knowledge-Preserving Graph Neural Network (KPGNN) is designed for incremental social event detection. It utilizes rich semantics and structural information in social messages to continuously detect events and extend its knowledge base. KPGNN outperforms baseline models, with potential for future research in event analysis and causal discovery in social data.
- Finevent: Fine-Grained Event Detection (FinEvent) uses a reinforced, incremental, and cross-lingual architecture for social event detection. It employs multi-agent reinforcement learning and density-based clustering (DRL-DBSCAN) to improve performance in various detection tasks. Future work will extend RL-guided GNNs for event correlation and evolution.
- QSGNN: Quality-Aware Self-Improving Graph Neural Network (QSGNN) improves open set social event detection with a pairwise loss and orthogonal constraint for training. It uses similarity distributions for pseudo labels and a quality-aware strategy to reduce noise, achieving state-of-the-art results in both closed and open set scenarios.
- ETGNN: Evidential Temporal-aware Graph Neural Network (ETGNN) enhances social event detection by integrating uncertainty and temporal information using Evidential Deep Learning and Dempster-Shafer theory. It employs a novel temporal-aware GNN aggregator, outperforming other methods.
- HCRC: Hybrid Graph Contrastive Learning for Social Event Detection (HCRC) captures comprehensive semantic and structural information from social messages. Using hybrid graph contrastive learning and reinforced incremental clustering, HCRC outperforms baselines across various experimental settings.
- UCLSED: Uncertainty-Guided Class Imbalance Learning Framework (UCLSED) enhances model generalization in imbalanced social event detection tasks. It uses an uncertainty-guided contrastive learning loss to handle uncertain classes and combines multi-view architectures with Dempster-Shafer theory for robust uncertainty estimation, improving detection accuracy.
Contact
For further inquiries, please contact us via email at: support@socialed.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file SocialED-1.0.2.tar.gz.
File metadata
- Download URL: SocialED-1.0.2.tar.gz
- Upload date:
- Size: 144.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70e643307568cda7a43731272edd025a8ada3ac19111de341264c6156ca1b271
|
|
| MD5 |
1cb7487b07c56d9b34806569c3b05325
|
|
| BLAKE2b-256 |
06ccf663e92c9bc1ec9fa49ea626fbcf292dd1ac22371d661e8cf9698f5d3a34
|
File details
Details for the file SocialED-1.0.2-py3-none-any.whl.
File metadata
- Download URL: SocialED-1.0.2-py3-none-any.whl
- Upload date:
- Size: 146.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d82e5814d6bf3bd7f848a1a040de0fcb695a0603150e6ca225663845c81765c6
|
|
| MD5 |
c152d22271014431f28471e612787f2d
|
|
| BLAKE2b-256 |
442dba5fc160b2faf4f6760a0bbc1092d6463423feb155b5187d12433da062a1
|