Cluster Affiliation Network Embedding: platform-agnostic user networks from shared narrative participation
Project description
cane-networks
CANE (Cluster Affiliation Network Embedding) builds user-user networks from social media content without relying on follower graphs, reposts, or any platform-specific metadata. Instead of connecting users through behavioral traces, it connects them through shared participation in latent narrative clusters — modeling what people talk about rather than how they interact.
This makes it useful whenever your data spans multiple platforms or API access is limited, since the same method applies to X, Telegram, TikTok, Truth Social, Reddit, or any combination of them without modification.
The method was introduced in Gerard et al., ICWSM 2025 and applied to cross-platform narrative prediction in [Gerard et al., WWW 2025], where discourse-based networks substantially outperformed behavioral baselines across information operation detection, ideological stance prediction, and cross-platform emergence forecasting.
Installation
pip install cane-networks
FAISS is optional but strongly recommended for large datasets. Install whichever variant matches your hardware:
pip install cane-networks faiss-cpu # CPU
pip install cane-networks faiss-gpu # GPU
Without FAISS the package falls back to scikit-learn's NearestNeighbors, which works fine at smaller scales.
Usage
The input is a DataFrame with at least two columns: one for user identifiers and one for narrative cluster labels. How you get the cluster labels is up to you — DP-Means over sentence embeddings works well, but any clustering is fine.
Static graph (CANE)
import pandas as pd
from cane import CANE
# df needs: disc_node_id (user), cluster (narrative label)
model = CANE(similarity_threshold=0.2)
G = model.fit(df)
Temporal graph (t-CANE)
t-CANE computes similarities at each time bin and aggregates them across time. Repeated co-engagement across bins strengthens edges; lapsed connections decay.
from cane import tCANE
# df additionally needs a time_bin column, e.g. biweekly periods
df["time_bin"] = df["created_at"].dt.to_period("2W").astype(str)
model = tCANE(method="decay", lambda_=0.2)
G = model.fit(df)
Available aggregation methods: decay, sum, average, max, stability.
Handling large narrative vocabularies
When your corpus has many narrative clusters (say, more than ten thousand), the TF-IDF user vectors become very high-dimensional and sparse. Cosine similarity degrades in this regime because most users share almost no clusters — the network ends up empty or dominated by a handful of high-volume accounts.
The fix is dimensionality reduction via TruncatedSVD before the similarity search. The key question is how many dimensions to use, and suggest_svd_dims answers it in terms of variance retained:
from cane import suggest_svd_dims
recommended_dims, curve = suggest_svd_dims(df, target_variance=0.90)
# → 147 dimensions retain 90.0% of variance
This fits a single SVD on the full matrix and reads off the cumulative explained variance, so you get the entire curve cheaply and can pick whatever retention level makes sense for your use case. Once you have a number, pass it via target_variance:
model = CANE(similarity_threshold=0.2, target_variance=0.90)
G = model.fit(df)
If you're not sure whether you need it, check the sparsity diagnostic that suggest_svd_dims prints. Sparsity above ~0.97 is the signal that reduction will help.
Picking a similarity threshold
A threshold that's too high gives you a sparse, disconnected graph; too low and you're connecting users who don't really share much. suggest_threshold shows you the connectivity rate at several candidate values so you can make an informed choice:
model = CANE(target_variance=0.90)
results = model.suggest_threshold(df)
# Threshold → Connectivity rate
# 0.10 → 94.3% nodes connected
# 0.20 → 81.2% nodes connected
# 0.30 → 63.7% nodes connected
# 0.40 → 41.0% nodes connected
# 0.50 → 22.1% nodes connected
For most downstream tasks (IO detection, stance prediction) something in the 0.15–0.30 range tends to work well. Narrative emergence prediction can tolerate lower thresholds since the signal comes from neighbor activity counts rather than community structure.
Graph diagnostics
from cane import graph_diagnostics
graph_diagnostics(G, name="My corpus")
# My corpus
# ========================================
# Nodes: 45231
# Edges: 312847
# Connected nodes: 41983 (92.8%)
# ...
Citation
If you use this in your work, please cite:
@article{gerard2025bridging,
title={Bridging the narrative divide: Cross-platform discourse networks in fragmented ecosystems},
author={Gerard, Patrick and Hanley, Hans WA and Luceri, Luca and Ferrara, Emilio},
journal={arXiv preprint arXiv:2505.21729},
year={2025}
}
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cane_networks-0.1.0.tar.gz.
File metadata
- Download URL: cane_networks-0.1.0.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74087c474294be7f7a66b194180ad16d17a5da6c98c4ab24100510118024c7a4
|
|
| MD5 |
842f5d3fe02a6548b698edbda76acbd8
|
|
| BLAKE2b-256 |
90c5d41c0abe5860decde4fd65b187a77d17212a2f3906673eccd864a8b60884
|
File details
Details for the file cane_networks-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cane_networks-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37bf472d994ef428bca87dffa52cc6916e6f2914df001cfe6bb69c754ee0aa33
|
|
| MD5 |
3cbc53d82e4ed281dbbf159779df2049
|
|
| BLAKE2b-256 |
a8571ce01e1e7a1b7e6b848520ea563757bbdfe1ba70a4d01ac61951d1c71d56
|