Skip to main content

Package to create networks for detecting coordination in social media.

Project description

coordinationz

Collection of scripts and package to analyze coordination in social media data.

To install the package, download the git repository and run the following command in the root directory:

pip install .

To install the package in development mode, run the following commands in the root directory:

pip install meson-python ninja numpy
pip install --no-build-isolation -e .

For debug mode, use the following command for local installation:

pip install --no-build-isolation -U -e . -Csetup-args=-Dbuildtype=debug

To debug the C code, use gdb:

gdb -ex=run -args python <python file>

Run for INCAS datasets (e.g., phase2a or phase2b)

First install the package as described above. The next step is setting up the config.toml file. You can use config_template.toml as a template.

cp config_template.toml config.toml

Setup the paths for the INCAS datasets and networks

# Location of jsonl files
INCAS_DATASETS = "/mnt/osome/INCAS/datasets" 

# Location where the preprocessed datasets will be stored
PREPROCESSED_DATASETS = "Data/Preprocessed"

#Logation of the outputs 
NETWORKS = "Outputs/Networks"
FIGURES = "Outputs/Figures"
TABLES = "Outputs/Tables"
CONFIGS = "Outputs/Configs"

The INCAS_DATASETS folder should contain the uncompressed jsonl files.

First, the files should be preprocessed. This can be done by running the following python script:

python pipeline/preprocess/preprocessINCAS.py <dataname>

where dataname is the name of the dataset, which correspondts to the <INCAS_DATASETS>/<dataname>.jsonl file. Together with the preprocessed data, the script will generate a .txt file with some information about the dataset.

The parameters of the indicators can be set in the config.toml file.

Currently, only co-hashtag, co-URL and co-retweets are supported.

To run the indicators, you can use the pipeline/indicators.py script by running the following command:

python pipeline/indicators.py <dataname>

where dataname is the name of the dataset and indicator is the indicator to be run.

You an add a suffix to the output files by adding the --suffix parameter:

python pipeline/indicators.py <dataname> --suffix <suffix>

if no suffix is provided, the a timestamp will be used as suffix.

Such a process will generate files in the output directories defined by NETWORKS, TABLES, and CONFIGS.

In particular, the TABLES folder will contain the suspicious pairs of users and clusters in CSV format.

The NETWORKS folder will contain the networks in xnet format. xnet format can be read by using the xnetwork package:

pip install xnetwork

and using the following code:

import xnetwork as xn
g = xn.load("network.xnet")

The result is an igraph network. You can convert it to the networkx format by using the following code:

network = g.to_networkx()

The config file used to generate the data will be copied to the "CONFIG" directory. A new section will be added to the config with extra parameters about the run.

Text similarity indicators

The text similarity indicators can be run by including usctextsimilarity, textsimilarity or coword to the indicator list. For instance pipeline/indicators.py <data_name> -i cohashtag coretweet courl textsimilarity. usctextsimilarity` and textsimilarity requires the instalation of packages faiss and sentence-transformers. GPU is recommended for performance.

Run for IO datasets

Repeat the same steps as for INCAS datasets, but set the IO_DATASETS variable in the config.toml file to the location of the IO datasets. Also, for preprocessing, use the pipeline/preprocess/preprocessIO.py script.

Submitted methodologies

To generate the results submmited for the evaluation datasets, use the following procedures:

First preprocess the dataset according to the preprocess instructions above.

For the UNION approach:

  • Copy the config_template_union.toml to config_union.toml and set the PATHS accordingly.
  • Run the following command:
python pipeline/indicators.py <dataname> -c config_union.toml -i cohashtag coretweet courl coword -s union

where <dataname> is the filename of the dataset (for the evaluation dataset it should be TA2_full_eval_NO_GT_nat_2024-06-03 or TA2_full_eval_NO_GT_nat+synth_2024-06-03).

  • The results will be stored in the Outputs/Tables (or the folder defined in the config file).

For the SOFTUNION approach:

  • Copy the config_template_softunion.toml to config_softunion.toml and set the PATHS accordingly.
  • Run the following command:
python pipeline/indicators.py <dataname> -c config_softunion.toml -i cohashtag coretweet courl coword -s softunion

where <dataname> is the filename of the dataset (for the evaluation dataset it should be TA2_full_eval_NO_GT_nat_2024-06-03 or TA2_full_eval_NO_GT_nat+synth_2024-06-03).

  • The results will be stored in the Outputs/Tables (or the folder defined in the config file).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

coordinationz-0.0.1-cp312-cp312-win_amd64.whl (49.7 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

coordinationz-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (176.3 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

coordinationz-0.0.1-cp312-cp312-macosx_11_0_arm64.whl (93.7 kB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

coordinationz-0.0.1-cp312-cp312-macosx_10_9_x86_64.whl (100.2 kB view hashes)

Uploaded CPython 3.12 macOS 10.9+ x86-64

coordinationz-0.0.1-cp311-cp311-win_amd64.whl (49.7 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

coordinationz-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (176.2 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

coordinationz-0.0.1-cp311-cp311-macosx_11_0_arm64.whl (93.6 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

coordinationz-0.0.1-cp311-cp311-macosx_10_9_x86_64.whl (100.1 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

coordinationz-0.0.1-cp310-cp310-win_amd64.whl (49.7 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

coordinationz-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (176.2 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

coordinationz-0.0.1-cp310-cp310-macosx_11_0_arm64.whl (93.6 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

coordinationz-0.0.1-cp310-cp310-macosx_10_9_x86_64.whl (100.1 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

coordinationz-0.0.1-cp39-cp39-win_amd64.whl (49.7 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

coordinationz-0.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (176.2 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

coordinationz-0.0.1-cp39-cp39-macosx_11_0_arm64.whl (93.6 kB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

coordinationz-0.0.1-cp39-cp39-macosx_10_9_x86_64.whl (100.1 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

coordinationz-0.0.1-cp38-cp38-win_amd64.whl (49.6 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

coordinationz-0.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (179.2 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

coordinationz-0.0.1-cp38-cp38-macosx_11_0_arm64.whl (95.7 kB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

coordinationz-0.0.1-cp38-cp38-macosx_10_9_x86_64.whl (102.0 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page