Skip to main content

A Python library for detecting astroturfing (coordinated inauthentic behavior) in social media posts.

Project description

Astrodetection

Astrodetection is a Python library designed for detecting astroturfing clues from lists of posts (mainly on X up to now, but not exclusively)

Installation

Pip

pip install "astrodetection[standard]"

or

pip install "astrodetection[light]"

Conda

  1. Use the YAML file to configure the environment with conda:

    conda create -n astrodetection_env
    conda activate astrodetection_env
    conda env update -f environment_standard.yml
    

Note: the environment_standard.yml configuration file uses FAISS and Fasttext libraries for VIGINUM D3LTA implementation

**If you have compatibility issues, prefer environment_light.yml and use astrodetection_light module

Usage

You can import directly the main functions:

from astrodetection import semantic_faiss, prepare_input_data, compute_bot_likelihood_metrics, create_network

Or use them directly:

import glob
import pandas as pd
import os
import numpy as np
import astrodetection

# Load a single JSON file into a DataFrame
file = "file_path"  # Select the first file
df = pd.read_json(file)
df.index = df.index.astype(str)  # Compatibility with d3lta

# Preprocess the DataFrame
df = df[df['tweet'].str.len() > 100]
df = df[df['username'] != 'grok']
df.index = df.index.astype(str)

# Compute matches and scores
df_filtered, df_emb = astrodetection.prepare_input_data(df, embeddings=df['emb'])

matches, df_cluster = astrodetection.semantic_faiss(
    df_filtered.rename(columns={'tweet': 'original'}),
    min_size_txt=0,
    df_embeddings_use=df_emb,
    threshold_grapheme=0.8,
    threshold_language=0.715,
    threshold_semantic=0.9
) #function taken from D3LTA 

scores = astrodetection.compute_bot_likelihood_metrics(df, matches=matches)

# Create a network
network = astrodetection.create_network(matches, df)

New changes

  1. semantic_faiss function can now take detect only copypastas based on levenshtein distance, ignoring embeddings, if "skip" is passed as argument in df_embeddings_use field.

  2. compute_bot_likelihood_metrics function can now take columns' names as arguments for more customization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

astrodetection-0.1.9.tar.gz (36.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

astrodetection-0.1.9-py3-none-any.whl (38.6 kB view details)

Uploaded Python 3

File details

Details for the file astrodetection-0.1.9.tar.gz.

File metadata

  • Download URL: astrodetection-0.1.9.tar.gz
  • Upload date:
  • Size: 36.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for astrodetection-0.1.9.tar.gz
Algorithm Hash digest
SHA256 ee38717b65b7954de7a67ec222c6f7aceb0812ba76da8bebe20f2e04b8c31b7b
MD5 fcc743e1621103e9a3d1a0bd3944ac07
BLAKE2b-256 01c90cae5a5c739b97a65d509301f5cca116341d9d6f4ee5f7e857e1a81ee508

See more details on using hashes here.

File details

Details for the file astrodetection-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: astrodetection-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 38.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for astrodetection-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 c9ea2058144a12a3c374f79b7ea92847afbd3171291e4b9e98aee44a658dda75
MD5 11fe7ac04c203176eee1dad6ddc2c93b
BLAKE2b-256 e44f94e87d836dce06d046770a232d91d16a72805c6ecf002e1a2e357b86953b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page