Cluster Affiliation Network Embedding: platform-agnostic user networks from shared narrative participation

These details have not been verified by PyPI

Project links

Project description

cane-networks

CANE (Cluster Affiliation Network Embedding) builds user-user networks from social media content without relying on follower graphs, reposts, or any platform-specific metadata. Instead of connecting users through behavioral traces, it connects them through shared participation in latent narrative clusters — modeling what people talk about rather than how they interact.

This makes it useful whenever your data spans multiple platforms or API access is limited, since the same method applies to X, Telegram, TikTok, Truth Social, Reddit, or any combination of them without modification.

The method was introduced in Gerard et al., ICWSM 2025 and applied to cross-platform narrative prediction in [Gerard et al., WWW 2025], where discourse-based networks substantially outperformed behavioral baselines across information operation detection, ideological stance prediction, and cross-platform emergence forecasting.

Installation

pip install cane-networks

FAISS is optional but strongly recommended for large datasets. Install whichever variant matches your hardware:

pip install cane-networks faiss-cpu   # CPU
pip install cane-networks faiss-gpu   # GPU

Without FAISS the package falls back to scikit-learn's NearestNeighbors, which works fine at smaller scales.

Usage

The input is a DataFrame with at least two columns: one for user identifiers and one for narrative cluster labels. How you get the cluster labels is up to you — DP-Means over sentence embeddings works well, but any clustering is fine.

Static graph (CANE)

import pandas as pd
from cane import CANE

# df needs: disc_node_id (user), cluster (narrative label)
model = CANE(similarity_threshold=0.2)
G = model.fit(df)

Temporal graph (t-CANE)

t-CANE computes similarities at each time bin and aggregates them across time. Repeated co-engagement across bins strengthens edges; lapsed connections decay.

from cane import tCANE

# df additionally needs a time_bin column, e.g. biweekly periods
df["time_bin"] = df["created_at"].dt.to_period("2W").astype(str)

model = tCANE(method="decay", lambda_=0.2)
G = model.fit(df)

Available aggregation methods: decay, sum, average, max, stability.

Handling large narrative vocabularies

When your corpus has many narrative clusters (say, more than ten thousand), the TF-IDF user vectors become very high-dimensional and sparse. Cosine similarity degrades in this regime because most users share almost no clusters — the network ends up empty or dominated by a handful of high-volume accounts.

The fix is dimensionality reduction via TruncatedSVD before the similarity search. The key question is how many dimensions to use, and suggest_svd_dims answers it in terms of variance retained:

from cane import suggest_svd_dims

recommended_dims, curve = suggest_svd_dims(df, target_variance=0.90)
# → 147 dimensions retain 90.0% of variance

This fits a single SVD on the full matrix and reads off the cumulative explained variance, so you get the entire curve cheaply and can pick whatever retention level makes sense for your use case. Once you have a number, pass it via target_variance:

model = CANE(similarity_threshold=0.2, target_variance=0.90)
G = model.fit(df)

If you're not sure whether you need it, check the sparsity diagnostic that suggest_svd_dims prints. Sparsity above ~0.97 is the signal that reduction will help.

Picking a similarity threshold

A threshold that's too high gives you a sparse, disconnected graph; too low and you're connecting users who don't really share much. suggest_threshold shows you the connectivity rate at several candidate values so you can make an informed choice:

model = CANE(target_variance=0.90)
results = model.suggest_threshold(df)

# Threshold → Connectivity rate
# 0.10  →  94.3% nodes connected
# 0.20  →  81.2% nodes connected
# 0.30  →  63.7% nodes connected
# 0.40  →  41.0% nodes connected
# 0.50  →  22.1% nodes connected

For most downstream tasks (IO detection, stance prediction) something in the 0.15–0.30 range tends to work well. Narrative emergence prediction can tolerate lower thresholds since the signal comes from neighbor activity counts rather than community structure.

Graph diagnostics

from cane import graph_diagnostics

graph_diagnostics(G, name="My corpus")
# My corpus
# ========================================
# Nodes:              45231
# Edges:              312847
# Connected nodes:    41983 (92.8%)
# ...

Citation

If you use this in your work, please cite:

@article{gerard2025bridging,
  title={Bridging the narrative divide: Cross-platform discourse networks in fragmented ecosystems},
  author={Gerard, Patrick and Hanley, Hans WA and Luceri, Luca and Ferrara, Emilio},
  journal={arXiv preprint arXiv:2505.21729},
  year={2025}
}

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cane_networks-0.1.0.tar.gz (14.0 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cane_networks-0.1.0-py3-none-any.whl (13.2 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file cane_networks-0.1.0.tar.gz.

File metadata

Download URL: cane_networks-0.1.0.tar.gz
Upload date: May 12, 2026
Size: 14.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.13

File hashes

Hashes for cane_networks-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`74087c474294be7f7a66b194180ad16d17a5da6c98c4ab24100510118024c7a4`
MD5	`842f5d3fe02a6548b698edbda76acbd8`
BLAKE2b-256	`90c5d41c0abe5860decde4fd65b187a77d17212a2f3906673eccd864a8b60884`

See more details on using hashes here.

File details

Details for the file cane_networks-0.1.0-py3-none-any.whl.

File metadata

Download URL: cane_networks-0.1.0-py3-none-any.whl
Upload date: May 12, 2026
Size: 13.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.13

File hashes

Hashes for cane_networks-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`37bf472d994ef428bca87dffa52cc6916e6f2914df001cfe6bb69c754ee0aa33`
MD5	`3cbc53d82e4ed281dbbf159779df2049`
BLAKE2b-256	`a8571ce01e1e7a1b7e6b848520ea563757bbdfe1ba70a4d01ac61951d1c71d56`

See more details on using hashes here.

cane-networks 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cane-networks

Installation

Usage

Static graph (CANE)

Temporal graph (t-CANE)

Handling large narrative vocabularies

Picking a similarity threshold

Graph diagnostics

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes