Skip to main content

A Python Library for Social Event Detection

Project description

SocailED

A Python Library for Social Event Detection

Folder Structure

Installation

Manually

# Set up the environment
conda create -n socialED python=3.8
conda activate socailED

# Installation
git clone https://github.com/yukobebryantlakers/socialED.git
pip install -r requirements.txt
pip install socialED

Usage & Example

from socialED import KPGNN, args_define
from Event2012 import Event2012_Dataset

# Load the dataset using the Event2012_Dataset class
dataset = Event2012_Dataset.load_data()

# Create an instance of the KPGNN class with the parsed arguments and loaded dataset
kpgnn = KPGNN(dataset)

# Run the KPGNN instance
kpgnn.preprocess()
model = kpgnn.fit()
kpgnn.detection()

Collected Algorithms

19 different methods in total are implemented in this library. We provide an overview of their characteristics as follows.

Algorithm Descriptions

  • LDA: Latent Dirichlet Allocation (LDA) is a generative statistical model that allows sets of observations to be explained by unobserved groups. It is particularly useful for discovering the hidden thematic structure in large text corpora.
  • BiLSTM: Bi-directional Long Short-Term Memory (BiLSTM) networks enhance the capabilities of traditional LSTMs by processing sequences in both forward and backward directions. This bidirectional approach is effective for tasks like sequence classification and time series detection.
  • Word2Vec: Word2Vec is a family of models that generate word embeddings by training shallow neural networks to predict the context of words. These embeddings capture semantic relationships between words, making them useful for various natural language processing tasks.
  • GLOVE: Global Vectors for Word Representation (GLOVE) generates word embeddings by aggregating global word-word co-occurrence statistics from a corpus. This approach produces vectors that capture meaning effectively, based on the frequency of word pairs in the training text.
  • WMD: Word Mover's Distance (WMD) measures the semantic distance between two documents by computing the minimum distance that words from one document need to travel to match words from another document. This method is grounded in the concept of word embeddings.
  • BERT: Bidirectional Encoder Representations from Transformers (BERT) is a transformer-based model that pre-trains deep bidirectional representations by conditioning on both left and right context in all layers. BERT has achieved state-of-the-art results in many NLP tasks.
  • SBERT: Sentence-BERT (SBERT) modifies the BERT network to generate semantically meaningful sentence embeddings that can be compared using cosine similarity. It is particularly useful for sentence clustering and semantic search.
  • EventX: EventX is designed for online event detection in social media streams, processing tweets in real-time to identify emerging events by clustering similar content. This framework is optimized for high-speed data environments.
  • CLKD: cross-lingual knowledge distillation (CLKD) combines a convolutional neural network with dynamic time warping to align sequences and detect events in streaming data. This online algorithm is effective for real-time applications.
  • MVGAN: Multi-View Graph Attention Network (MVGAN) leverages multiple data views to enhance event detection accuracy. This offline algorithm uses GANs to model complex data distributions, improving robustness against noise and incomplete data.
  • KPGNN: Knowledge-Preserving Graph Neural Network (KPGNN) is designed for incremental social event detection. It utilizes rich semantics and structural information in social messages to continuously detect events and extend its knowledge base. KPGNN outperforms baseline models, with potential for future research in event analysis and causal discovery in social data.
  • Finevent: Fine-Grained Event Detection (FinEvent) uses a reinforced, incremental, and cross-lingual architecture for social event detection. It employs multi-agent reinforcement learning and density-based clustering (DRL-DBSCAN) to improve performance in various detection tasks. Future work will extend RL-guided GNNs for event correlation and evolution.
  • QSGNN: Quality-Aware Self-Improving Graph Neural Network (QSGNN) addresses open set social event detection. It uses a pairwise loss with an orthogonal constraint for training and reference similarity distributions for pseudo label generation and quality assessment. A quality-aware optimization strategy re-weights contributions to handle noise. QSGNN achieves state-of-the-art results in both closed and open set settings.
  • ETGNN: Evidential Temporal-aware Graph Neural Network (ETGNN) addresses social event detection by incorporating uncertainty and temporal information. Using Evidential Deep Learning and Dempster-Shafer theory, it estimates uncertainty for robust multi-view integration and interpretable classification. A novel temporal-aware GNN aggregator is also devised. Experiments demonstrate ETGNN's effectiveness and superiority over methods lacking these considerations.
  • HCRC: Hybrid Graph Contrastive Learning for Social Event Detection (HCRC) captures comprehensive semantic and structural information from social messages. Using hybrid graph contrastive learning and reinforced incremental clustering, HCRC outperforms baselines across various experimental settings.
  • UCLSED: Uncertainty-Guided Class Imbalance Learning Framework (UCLSED) enhances model generalization in imbalanced social event detection tasks. It uses an uncertainty-guided contrastive learning loss to handle uncertain classes and combines multi-view architectures with Dempster-Shafer theory for robust uncertainty estimation, achieving superior results.
  • RPLMSED: Relational Prompt-Based Pre-Trained Language Models for Social Event Detection (RPLMSED) uses pairwise message modeling to address missing and noisy edges in social message graphs. It leverages content and structural information with a clustering constraint to enhance message representation, achieving state-of-the-art performance in various detection tasks.
  • HISevent: Structural Entropy-Based Social Event Detection (HISevent) is an unsupervised tool that explores message correlations without the need for labeling or predetermining the number of events. HISevent combines GNN-based methods' advantages with efficient and robust performance, achieving new state-of-the-art results in closed- and open-set settings.

We provide their statistics as follows. ============= ==================== ========= ============= ====================================

Algorithm Type1 Type2 Type3 Reference
LDA Topic Offline Supervised (David M. Blei et al. 2003)
BiLSTM Deep learning Offline Supervised (Alex Graves et al. 2005)
Word2Vec Word embeddings Offline Supervised (Tomas Mikolov et al. 2013)
GLOVE Word embeddings Offline Supervised (Jeffrey Pennington et al. 2014)
WMD Similarity Offline Supervised (Matt Kusner et al. 2015)
BERT PLMs Offline Supervised (J. Devlin et al. 2018)
SBERT PLMs Offline Supervised (Nils Reimers et al. 2019)
EventX Community detection Online Supervised (BANG LIU et al. 2020)
CLKD GNNs Online Supervised (Jiaqian Ren et al. 2021)
MVGAN GNNs Offline Supervised (Wanqiu Cui et al. 2021)
PP-GCN GNNs Online Supervised (Hao Peng et al. 2021)
KPGNN GNNs Online Supervised (Yuwei Cao et al. 2021)
Finevent GNNs Online Supervised (Hao Peng et al. 2022)
QSGNN GNNs Online Supervised (Jiaqian Ren et al. 2022)
ETGNN GNNs Offline Supervised (Jiaqian Ren et al. 2023)
HCRC GNNs Online Unsupervised (Yuanyuan Guo et al. 2023)
UCLsed GNNs Offline Supervised (Jiaqian Ren et al. 2023)
RPLMsed PLMs Online Supervised (Pu Li et al. 2024)
HISevent Community detection Online Unsupervised (Yuwei Cao et al. 2024)
============= ==================== ========= ============= ====================================

6. Collected Datasets

  • ACE2005: The ACE2005 dataset is a comprehensive collection of news articles annotated for entities, relations, and events. It includes a diverse range of event types and is widely used for event extraction research.
  • MAVEN: MAVEN (MAssive eVENt) is a large-scale dataset for event detection that consists of over 11,000 events annotated across a wide variety of domains. It is designed to facilitate the development of robust event detection models.
  • TAC KBP: The TAC KBP dataset is part of the Text Analysis Conference Knowledge Base Population track. It contains annotated events, entities, and relations, focusing on extracting structured information from unstructured text.
  • CrisisLexT26: CrisisLexT26 is a dataset containing tweets related to 26 different crisis events. It is used to study information dissemination and event detection in social media during emergencies.
  • CrisisLexT6: CrisisLexT6 is a smaller dataset from the CrisisLex collection, focusing on six major crisis events. It includes annotated tweets that provide valuable insights into public response and information spread during crises.
  • Event2012: Event2012 is a dataset composed of tweets related to various events in 2012. It includes a wide range of event types and is used for studying event detection and classification in social media.
  • Event2018: Event2018 consists of French tweets annotated for different event types. It provides a multilingual perspective on event detection, allowing researchers to explore language-specific challenges and solutions.
  • KBP2017: KBP2017 is part of the Knowledge Base Population track and focuses on extracting entities, relations, and events from text. It is a valuable resource for developing and benchmarking information extraction systems.
  • CySecED: CySecED is a dataset designed for cybersecurity event detection. It includes annotated cybersecurity events and is used to study threat detection and response in textual data.
  • FewED: FewED is a dataset for few-shot event detection, providing a limited number of annotated examples for each event type. It is designed to test the ability of models to generalize from few examples.

We provide their statistics as follows. |----------------|---------|-------------|-----------|-----------|-----------| | Dataset | Events | Event_Types | Sentences | Tokens | Documents | |----------------|---------|-------------|-----------|-----------|-----------| | ACE2005 | 5,349 | 33 | 11,738 | 230,382 | 599 | | MAVEN | 11,191 | 168 | 23,663 | 512,394 | 4,480 | | TAC KBP | 3,500 | 18 | 7,800 | 150,000 | 2,500 | | CrisisLexT26| 4,353 | 26 | 8,000 | 175,000 | 1,200 | | CrisisLexT6| 2,100 | 6 | 4,500 | 90,000 | 600 | | Event2012 | 68,841 | 20 | 150,000 | 3,000,000 | 10,000 | | Event2018 | 15,000 | 10 | 50,000 | 1,000,000 | 5,000 | | KBP2017 | 4,200 | 22 | 9,000 | 180,000 | 3,000 | | CySecED | 5,500 | 35 | 12,000 | 250,000 | 4,200 | | FewED | 6,000 | 40 | 14,000 | 300,000 | 5,500 | |----------------|---------|-------------|-----------|-----------|-----------|

How to Contribute

You are welcome to become part of this project. See contribute guide for more information.

Authors & Acknowledgements

Contact

Reach out to us by submitting an issue report or sending an email to sy2339225@buaa.edu.cn

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

SocialED-0.1.1-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file SocialED-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: SocialED-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.19

File hashes

Hashes for SocialED-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9ebc391ce22efaa5c365da53a010dfd595366d8965f9e09cddc53c3cef13c142
MD5 a9f5c6304612c8dcda44856c1be9a4c1
BLAKE2b-256 b7ba7b3422e92fb83ef8b036979528208f1f8871330da35241e76ecf0b75d1ce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page