Automated detection of social group appeals in text

These details have not been verified by PyPI

Project links

Repository

Project description

Group Appeal Detector

This Python package detects social group mentions in text and classifies the author's stance toward each group as positive, negative, or neutral via fine-tuned BERT models. It also supports grouping a number of appeals into qualitative categories by performing k-means clustering on the appeals' vector representations.

Installation

pip install group-appeal-detector

Quick Start

Detect social group mentions and the author's stance toward each group in a single call. Use device="cuda" or device="mps" to run on GPU.

from group_appeal_detector import GroupAppealDetector

sentence = "Our party supports the interests of young people and working families."

detector = GroupAppealDetector(device="cpu")
results = detector.detect(sentence)

for r in results:
    print(r["span"], r["stance"])

Usage

Users can classify social group mentions and the authors' stances toward the groups separately from each other or do both in one run. Qualitative category assignments for distinguishing between different social groups can be obtained by performing clustering on the embedding space.

Group Mention Detection

Detect group mentions in a single text or a batch. For batches, pass a list of sentences and control how many should be processed in parallel by setting the batch_size. Increase it on GPU, decrease it if you run into memory issues. Results can be returned as a list of dicts or a pandas DataFrame via as_df=True.

# classify a single sentence
sentence = "Our party supports the interests of young people and working families."
results = detector.detect_mentions(sentence)
for r in results:
    print(r["span"], r["start"], r["end"])

# classify a batch of sentences
sentence_1 = "Farmers must earn more money."
sentence_2 = "The government must do more to protect the women living in this country."
batch = [sentence_1, sentence_2]
results_df = detector.detect_mentions_batch(batch, batch_size=8, as_df=True)
results_df.head()

Stance Classification

Classify the author's stance toward a specific group as positive, negative, or neutral. For batches, pass a list of (text, group) pairs. Results include the predicted stance and the probability for each class.

# classify a single text
sentence = "We must protect the rights of farmers."
target_group = "farmers"
result = detector.classify_stance(sentence, target_group)
print(result["predicted_stance"], result["stance_probs"])

# classify a batch of (text, group) pairs
pairs = [
    ("We must protect the rights of farmers.", "farmers"),
    ("We do a lot for elderly people.", "elderly people"),
]
results_df = detector.classify_stance_batch(pairs, batch_size=8, as_df=True)
results_df.head()

Combined Detection

If the interest is of both the location of a social group mention and the author's stance toward it, classify both in one go. This can again be done for a single sentence as well as for a larger number of sentences via batch processing.

# classify a single sentence
sentence = "Our party supports the interests of young people and working families."
results = detector.detect(sentence)
for r in results:
    print(r["span"], r["stance"])

# classify a batch of sentences
sentence_1 = "Farmers must earn more money."
sentence_2 = "The government must do more to protect the women living in this country."
batch = [sentence_1, sentence_2]
results_df = detector.detect_batch(batch, batch_size=8, as_df=True)
results_df.head()

Clustering

Cluster detected group mentions into categories using GroupMentionClusterer. It performs k-means clustering on vector representations produced by a BERT model fine-tuned via contrastive learning to maximize separability between different social groups. Set n_clustersto the number of clusters the algorithm should produce.

from group_appeal_detector import GroupAppealDetector, GroupMentionClusterer

# collect mentions from a corpus
texts = [...]
all_mentions = detector.detect_mentions_batch(texts, batch_size=16, as_df=False)
mentions = [m["span"] for mentions in all_mentions for m in mentions]

# cluster the mentions
clusterer = GroupMentionClusterer(mentions, device="cpu")
results_df = clusterer.cluster(n_clusters=5, as_df=True)
results_df.head()

If there is no prior knowledge on a likely number of clusters, make use of find_optimal_k to determine the best number of clusters before running cluster. This method computes the average silhouette score and returns the kthat maximizes this internal validation metric. Inspect the development of silhouette scores over increasing number of kif desired.

# collect mentions from a corpus
texts = [...]
all_mentions = detector.detect_mentions_batch(texts, batch_size=16, as_df=False)
mentions = [m["span"] for mentions in all_mentions for m in mentions]

# find the optimal k based on silhouette score
best_k, all_scores = clusterer.find_optimal_k(k_range=(2, 20), metric="silhouette", visualize=True)

# run with best k
results_df = clusterer.cluster(n_clusters=best_k, as_df=True)
results_df.head()

Alternatively, if a reference dictionary of known social group categories is available, the optimal k can be determined by maximizing the Normalized Mutual Information (NMI) score between cluster assignments and dictionary-based category labels. Pass the dictionary as a pandas DataFrame where each column represents a category and each row contains example terms. The method then finds all group mentions that match any example term and computes the NMI-score based on the known social group categories of the detected terms and the cluster assignments.

By maximizing the NMI-score one maximizes the reproducibility of the known social group categories within the data. Users can also decide based on both the silhouette and NMI-score in order to balance both internal and external validation metrics.

import pandas as pd

dictionary_df = pd.read_csv("social_groups.csv")

# find the optimal k based on nmi score
best_k, all_scores = clusterer.find_optimal_k(
    k_range=(2, 20),
    metric="nmi",
    dictionary_df=dictionary_df,
    visualize=True,
)

# run with best k
results_df = clusterer.cluster(n_clusters=best_k, as_df=True)
results_df.head()

Conceptual Background

The definitions used in this package are largely inspired by Lena Maria Huber and Alona O. Dolinsky and Will Horne, Alona O. Dolinsky and Lena Maria Huber.

A social group is a segment of society or a collection of people who share common sociodemographic traits or attributes that are ascriptive and/or acquired. A reference to a social group in text is called a group mention. A group appeal is an intentional act that associates a political actor with a social group in either a supportive or critical manner.

Models

Group Mention Detection — `maxwlnd/roberta_group_mention_detector`

A RoBERTa-base token classification model fine-tuned on 5,000 manually annotated sentences drawn from parliamentary debates in the UK House of Commons (2010–2019). The training set was augmented with 25% synthetic paraphrases and trained using the BIO tagging scheme.

Cross-validated performance (95% confidence intervals in brackets):

Metric	Score
F1	0.82 [0.82, 0.83]
Precision	0.80 [0.79, 0.81]
Recall	0.84 [0.83, 0.85]

Stance Classification — `maxwlnd/socialgroup_stance_classification_nli`

A DeBERTa-v3-base NLI model fine-tuned for social group stance classification, built on top of MoritzLaurer/deberta-v3-base-zeroshot-v2.0. The zero-shot classifier got further fine-tuned based on the social group mentions manually detected in 5,000 sentences drawn from parliamentary debates in the UK House of Commons (2010–2019). The negative class was oversampled by adding synthetic paraphrases of 25% of all sentences with group mentions.

For each detected group mention, three hypotheses are formulated: positive, negative, and neutral. The model chooses the class with the largest entailment probability as the predicted stance.

Cross-validated performance (95% confidence intervals in brackets):

Metric	Negative	Neutral	Positive	Macro-Avg.
F1	0.76 [0.72, 0.80]	0.80 [0.78, 0.81]	0.89 [0.89, 0.89]	0.81 [0.80, 0.83]
Precision	0.85 [0.77, 0.94]	0.81 [0.79, 0.84]	0.87 [0.86, 0.88]	0.85 [0.82, 0.87]
Recall	0.70 [0.62, 0.77]	0.78 [0.76, 0.80]	0.91 [0.89, 0.92]	0.79 [0.77, 0.82]

Mention Embedding — `maxwlnd/cl_mention_embedding`

A BERT-base model with a linear projection head (dimensionality 128) fine-tuned via contrastive learning to produce embeddings that maximize separability between mentions of different social groups.

Each mention is fed into the model using the following template:

Social group of {mention} is: [MASK].

The model extracts the hidden state at the [MASK] position as the mention representation, passes it through the projection layer and L2-normalizes the embedding to make the distance computation independent of the vectors' magnitude.

The model was trained on the social group dictionary provided by Will Horne, Alona O. Dolinsky, Lena Maria Huber using the triplet loss. Each anchor is a term from a category, paired with a randomly sampled positive from the same category and a hard negative mined from a different category.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

0.1.4

Apr 10, 2026

This version

0.1.3

Apr 10, 2026

0.1.2

Apr 10, 2026

0.1.1

Apr 10, 2026

0.1.0

Apr 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

group_appeal_detector-0.1.3.tar.gz (21.6 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

group_appeal_detector-0.1.3-py3-none-any.whl (15.7 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file group_appeal_detector-0.1.3.tar.gz.

File metadata

Download URL: group_appeal_detector-0.1.3.tar.gz
Upload date: Apr 10, 2026
Size: 21.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for group_appeal_detector-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`c33613ed1b68856da66b69a4c55faa32608cf5017aa7ca32819bfca46480403b`
MD5	`2039768c2028b130b5d3e1ea66cc65a0`
BLAKE2b-256	`4ddae92435cea4ba61c5743614d98569a569a2ec90586ce0e24d32520f616a9d`

See more details on using hashes here.

File details

Details for the file group_appeal_detector-0.1.3-py3-none-any.whl.

File metadata

Download URL: group_appeal_detector-0.1.3-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 15.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for group_appeal_detector-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6d9880f00312337225d3b87f555e7de01968bc53ded08e593507525d044a2314`
MD5	`70b6b58b90ae6b41fa30df87207d3127`
BLAKE2b-256	`84c0f5b5cdf86c1448eead16fa6c9fbdd86ef208d57579735c0f942b774eb9b5`

See more details on using hashes here.

group-appeal-detector 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Group Appeal Detector

Installation

Quick Start

Usage

Group Mention Detection

Stance Classification

Combined Detection

Clustering

Conceptual Background

Models

Group Mention Detection — `maxwlnd/roberta_group_mention_detector`

Stance Classification — `maxwlnd/socialgroup_stance_classification_nli`

Mention Embedding — `maxwlnd/cl_mention_embedding`

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

group-appeal-detector 0.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Group Appeal Detector

Installation

Quick Start

Usage

Group Mention Detection

Stance Classification

Combined Detection

Clustering

Conceptual Background

Models

Group Mention Detection — maxwlnd/roberta_group_mention_detector

Stance Classification — maxwlnd/socialgroup_stance_classification_nli

Mention Embedding — maxwlnd/cl_mention_embedding

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Group Mention Detection — `maxwlnd/roberta_group_mention_detector`

Stance Classification — `maxwlnd/socialgroup_stance_classification_nli`

Mention Embedding — `maxwlnd/cl_mention_embedding`