A Python Library for Social Event Detection
Project description
SocailED
A Python Library for Social Event Detection
Folder Structure
Installation
Manually
# Set up the environment
conda create -n socialED python=3.8
conda activate socailED
# Installation
git clone https://github.com/yukobebryantlakers/socialED.git
pip install -r requirements.txt
pip install socialED
Usage & Example
from socialED import KPGNN, args_define
from Event2012 import Event2012_Dataset
# Load the dataset using the Event2012_Dataset class
dataset = Event2012_Dataset.load_data()
# Create an instance of the KPGNN class with the parsed arguments and loaded dataset
kpgnn = KPGNN(dataset)
# Run the KPGNN instance
kpgnn.preprocess()
model = kpgnn.fit()
kpgnn.detection()
Collected Algorithms
19 different methods in total are implemented in this library. We provide an overview of their characteristics as follows.
Algorithm Descriptions
- LDA: Latent Dirichlet Allocation (LDA) is a generative statistical model that allows sets of observations to be explained by unobserved groups. It is particularly useful for discovering the hidden thematic structure in large text corpora.
- BiLSTM: Bi-directional Long Short-Term Memory (BiLSTM) networks enhance the capabilities of traditional LSTMs by processing sequences in both forward and backward directions. This bidirectional approach is effective for tasks like sequence classification and time series detection.
- Word2Vec: Word2Vec is a family of models that generate word embeddings by training shallow neural networks to predict the context of words. These embeddings capture semantic relationships between words, making them useful for various natural language processing tasks.
- GLOVE: Global Vectors for Word Representation (GLOVE) generates word embeddings by aggregating global word-word co-occurrence statistics from a corpus. This approach produces vectors that capture meaning effectively, based on the frequency of word pairs in the training text.
- WMD: Word Mover's Distance (WMD) measures the semantic distance between two documents by computing the minimum distance that words from one document need to travel to match words from another document. This method is grounded in the concept of word embeddings.
- BERT: Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based model that pre-trains deep bidirectional representations by conditioning on both left and right context in all layers. BERT has achieved state-of-the-art results in many NLP tasks.
- SBERT: Sentence-BERT (SBERT) modifies the BERT network to generate semantically meaningful sentence embeddings that can be compared using cosine similarity. It is particularly useful for sentence clustering and semantic search.
- EventX: EventX is designed for online event detection in social media streams, processing tweets in real-time to identify emerging events by clustering similar content. This framework is optimized for high-speed data environments.
- CLKD: cross-lingual knowledge distillation (CLKD) combines a convolutional neural network with dynamic time warping to align sequences and detect events in streaming data. This online algorithm is effective for real-time applications.
- MVGAN: Multi-View Graph Attention Network (MVGAN) leverages multiple data views to enhance event detection accuracy. This offline algorithm uses GANs to model complex data distributions, improving robustness against noise and incomplete data.
- KPGNN: Knowledge-Preserving Graph Neural Network (KPGNN) is designed for incremental social event detection. It utilizes rich semantics and structural information in social messages to continuously detect events and extend its knowledge base. KPGNN outperforms baseline models, with potential for future research in event analysis and causal discovery in social data.
- Finevent: Fine-Grained Event Detection (FinEvent) uses a reinforced, incremental, and cross-lingual architecture for social event detection. It employs multi-agent reinforcement learning and density-based clustering (DRL-DBSCAN) to improve performance in various detection tasks. Future work will extend RL-guided GNNs for event correlation and evolution.
- QSGNN: Quality-Aware Self-Improving Graph Neural Network (QSGNN) addresses open set social event detection. It uses a pairwise loss with an orthogonal constraint for training and reference similarity distributions for pseudo label generation and quality assessment. A quality-aware optimization strategy re-weights contributions to handle noise. QSGNN achieves state-of-the-art results in both closed and open set settings.
- ETGNN: Evidential Temporal-aware Graph Neural Network (ETGNN) addresses social event detection by incorporating uncertainty and temporal information. Using Evidential Deep Learning and Dempster-Shafer theory, it estimates uncertainty for robust multi-view integration and interpretable classification. A novel temporal-aware GNN aggregator is also devised. Experiments demonstrate ETGNN's effectiveness and superiority over methods lacking these considerations.
- HCRC: Hybrid Graph Contrastive Learning for Social Event Detection (HCRC) captures comprehensive semantic and structural information from social messages. Using hybrid graph contrastive learning and reinforced incremental clustering, HCRC outperforms baselines across various experimental settings.
- UCLSED: Uncertainty-Guided Class Imbalance Learning Framework (UCLSED) enhances model generalization in imbalanced social event detection tasks. It uses an uncertainty-guided contrastive learning loss to handle uncertain classes and combines multi-view architectures with Dempster-Shafer theory for robust uncertainty estimation, achieving superior results.
- RPLMSED: Relational Prompt-Based Pre-Trained Language Models for Social Event Detection (RPLMSED) uses pairwise message modeling to address missing and noisy edges in social message graphs. It leverages content and structural information with a clustering constraint to enhance message representation, achieving state-of-the-art performance in various detection tasks.
- HISevent: Structural Entropy-Based Social Event Detection (HISevent) is an unsupervised tool that explores message correlations without the need for labeling or predetermining the number of events. HISevent combines GNN-based methods' advantages with efficient and robust performance, achieving new state-of-the-art results in closed- and open-set settings.
We provide their statistics as follows. ============= ==================== ========= ============= ====================================
Algorithm | Type1 | Type2 | Type3 | Reference |
---|---|---|---|---|
LDA | Topic | Offline | Supervised | (David M. Blei et al. 2003) |
BiLSTM | Deep learning | Offline | Supervised | (Alex Graves et al. 2005) |
Word2Vec | Word embeddings | Offline | Supervised | (Tomas Mikolov et al. 2013) |
GLOVE | Word embeddings | Offline | Supervised | (Jeffrey Pennington et al. 2014) |
WMD | Similarity | Offline | Supervised | (Matt Kusner et al. 2015) |
BERT | PLMs | Offline | Supervised | (J. Devlin et al. 2018) |
SBERT | PLMs | Offline | Supervised | (Nils Reimers et al. 2019) |
EventX | Community detection | Online | Supervised | (BANG LIU et al. 2020) |
CLKD | GNNs | Online | Supervised | (Jiaqian Ren et al. 2021) |
MVGAN | GNNs | Offline | Supervised | (Wanqiu Cui et al. 2021) |
PP-GCN | GNNs | Online | Supervised | (Hao Peng et al. 2021) |
KPGNN | GNNs | Online | Supervised | (Yuwei Cao et al. 2021) |
Finevent | GNNs | Online | Supervised | (Hao Peng et al. 2022) |
QSGNN | GNNs | Online | Supervised | (Jiaqian Ren et al. 2022) |
ETGNN | GNNs | Offline | Supervised | (Jiaqian Ren et al. 2023) |
HCRC | GNNs | Online | Unsupervised | (Yuanyuan Guo et al. 2023) |
UCLsed | GNNs | Offline | Supervised | (Jiaqian Ren et al. 2023) |
RPLMsed | PLMs | Online | Supervised | (Pu Li et al. 2024) |
HISevent | Community detection | Online | Unsupervised | (Yuwei Cao et al. 2024) |
============= ==================== ========= ============= ==================================== |
6. Collected Datasets
- ACE2005: The ACE2005 dataset is a comprehensive collection of news articles annotated for entities, relations, and events. It includes a diverse range of event types and is widely used for event extraction research.
- MAVEN: MAVEN (MAssive eVENt) is a large-scale dataset for event detection that consists of over 11,000 events annotated across a wide variety of domains. It is designed to facilitate the development of robust event detection models.
- TAC KBP: The TAC KBP dataset is part of the Text Analysis Conference Knowledge Base Population track. It contains annotated events, entities, and relations, focusing on extracting structured information from unstructured text.
- CrisisLexT26: CrisisLexT26 is a dataset containing tweets related to 26 different crisis events. It is used to study information dissemination and event detection in social media during emergencies.
- CrisisLexT6: CrisisLexT6 is a smaller dataset from the CrisisLex collection, focusing on six major crisis events. It includes annotated tweets that provide valuable insights into public response and information spread during crises.
- Event2012: Event2012 is a dataset composed of tweets related to various events in 2012. It includes a wide range of event types and is used for studying event detection and classification in social media.
- Event2018: Event2018 consists of French tweets annotated for different event types. It provides a multilingual perspective on event detection, allowing researchers to explore language-specific challenges and solutions.
- KBP2017: KBP2017 is part of the Knowledge Base Population track and focuses on extracting entities, relations, and events from text. It is a valuable resource for developing and benchmarking information extraction systems.
- CySecED: CySecED is a dataset designed for cybersecurity event detection. It includes annotated cybersecurity events and is used to study threat detection and response in textual data.
- FewED: FewED is a dataset for few-shot event detection, providing a limited number of annotated examples for each event type. It is designed to test the ability of models to generalize from few examples.
We provide their statistics as follows. |----------------|---------|-------------|-----------|-----------|-----------| | Dataset | Events | Event_Types | Sentences | Tokens | Documents | |----------------|---------|-------------|-----------|-----------|-----------| | ACE2005 | 5,349 | 33 | 11,738 | 230,382 | 599 | | MAVEN | 11,191 | 168 | 23,663 | 512,394 | 4,480 | | TAC KBP | 3,500 | 18 | 7,800 | 150,000 | 2,500 | | CrisisLexT26| 4,353 | 26 | 8,000 | 175,000 | 1,200 | | CrisisLexT6| 2,100 | 6 | 4,500 | 90,000 | 600 | | Event2012 | 68,841 | 20 | 150,000 | 3,000,000 | 10,000 | | Event2018 | 15,000 | 10 | 50,000 | 1,000,000 | 5,000 | | KBP2017 | 4,200 | 22 | 9,000 | 180,000 | 3,000 | | CySecED | 5,500 | 35 | 12,000 | 250,000 | 4,200 | | FewED | 6,000 | 40 | 14,000 | 300,000 | 5,500 | |----------------|---------|-------------|-----------|-----------|-----------|
How to Contribute
You are welcome to become part of this project. See contribute guide for more information.
Authors & Acknowledgements
Contact
Reach out to us by submitting an issue report or sending an email to sy2339225@buaa.edu.cn
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file SocialED-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: SocialED-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ebc391ce22efaa5c365da53a010dfd595366d8965f9e09cddc53c3cef13c142 |
|
MD5 | a9f5c6304612c8dcda44856c1be9a4c1 |
|
BLAKE2b-256 | b7ba7b3422e92fb83ef8b036979528208f1f8871330da35241e76ecf0b75d1ce |