A toolkit for keyword search and attack methods on user-provided datasets.
Project description
ROKSANA: Rewiring Of Keyword Search via Alteration of Network Architecture Toolkit
ROKSANA is a Python toolkit for performing keyword search and attack methods on user-provided datasets.
Features
- Custom Datasets: Bring your own dataset using PyG Geometric's dataset structure.
- Search Methods: Choose from pre-defined keyword search methods.
- Attack Methods: Utilize pre-defined attack methods or implement your own.
- Result Handling: Save results to files in various formats.
- Leaderboard Integration: Submit your results to the leaderboard.
Installation
pip install roksana
Preparing the Test Set
To evaluate the effectiveness of the search methods, you can prepare a search set consisting of query nodes and their corresponding gold sets. The gold set for each query consists of all nodes in the dataset that share the exact same feature vector as the query node.
Function: prepare_search_set
from roksana.datasets import prepare_search_set
# Assume 'data' is a torch_geometric.data.Data object
queries, gold_sets = prepare_search_set(data, percentage=0.1, seed=42)
Attack Methods
ROKSANA provides a suite of attack methods to evaluate the robustness of your search algorithms. Currently, the package includes predefined attack methods that you can leverage out-of-the-box or extend with your custom implementations.
Available Attack Methods
random: Randomly adds or removes edges connected to the query node.viking: Perturbs the feature vectors of the query node.
Using Attack Methods
from roksana.datasets import load_dataset, prepare_test_set
from roksana.attack_methods import get_attack_method
Load the Cora dataset
dataset = load_dataset(dataset_name='cora', root='data/')
data = dataset[0]
Prepare the test set
queries, gold_sets = prepare_test_set(data, percentage=0.1, seed=123)
Initialize an attack method
attack_method = get_attack_method('predefined_attack1', data=data, perturbations=2)
Perform attacks on queries
for query_node in queries:
attack_details = attack_method.attack(query_node=query_node, perturbations=2)
print(f"Attack on Node {query_node}: {attack_details}")
Evaluation
The Evaluation module in ROKSANA provides tools to assess the effectiveness of attack strategies on your search methods. By computing key metrics—Hit@k, Recall@k, and Demotion Value—you can quantify how attacks influence the performance and reliability of your search algorithms.
Key Metrics
-
Hit@k
- Definition: Measures whether at least one relevant node (from the gold set) appears in the top-k retrieved nodes.
- Interpretation: Higher values indicate better performance in retrieving relevant nodes within the top-k results.
-
Recall@k
- Definition: Quantifies the proportion of relevant nodes that are retrieved in the top-k results.
- Interpretation: Higher values signify that a larger fraction of relevant nodes are captured within the top-k retrieved nodes.
-
Demotion Value
- Definition: Measures the change in the rank of a target node (typically the query node itself) before and after an attack.
- Interpretation: Positive values indicate that the target node has been ranked lower post-attack, reflecting the attack's effectiveness in degrading its visibility.
Using the Evaluation Module
Here's a step-by-step guide to evaluating the impact of an attack on a search method.
1. Load Dataset and Prepare Test Set
from roksana.datasets import load_dataset, prepare_test_set
Load the Cora dataset
dataset = load_dataset(dataset_name='cora', root='data/')
data = dataset[0]
Prepare the test set with 10% of nodes as queries
queries, gold_sets = prepare_test_set(data, percentage=0.1, seed=123)
Saving Evaluation Results
ROKSANA provides utility functions to save evaluation results in various formats, including JSON, CSV, and Pickle. These functions are located within the evaluation.utils module.
Available Functions
save_results_to_json(results: List[Dict[str, Any]], filepath: str) -> Nonesave_results_to_csv(results: List[Dict[str, Any]], filepath: str) -> Nonesave_results_to_pickle(results: List[Dict[str, Any]], filepath: str) -> None
Usage Example
from roksana.evaluation import save_results_to_csv, save_results_to_json, save_results_to_pickle
# Assuming 'results' is a list of dictionaries containing evaluation metrics
results = [
{
'query_node': 0,
'k': 5,
'Hit@k_before_attack': 1.0,
'Hit@k_after_attack': 0.0,
'Recall@k_before_attack': 0.5,
'Recall@k_after_attack': 0.3,
'Demotion_value': 2
},
# Add more results as needed
]
# Save results in different formats
save_results_to_csv(results, 'evaluation_results/results.csv')
save_results_to_json(results, 'evaluation_results/results.json')
save_results_to_pickle(results, 'evaluation_results/results.pkl')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file roksana-0.2.4.tar.gz.
File metadata
- Download URL: roksana-0.2.4.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7bc20c51f8354bca2fc91e6720114ef0ae5bd619306c5941f260e1526725c400
|
|
| MD5 |
a8c1077a0e333307705fe9fecf84dfb9
|
|
| BLAKE2b-256 |
d0e40dd8f4a7cd84ccc8d06e163baf2c26e161804a4cbb52b453d9a01d9ca172
|
Provenance
The following attestation bundles were made for roksana-0.2.4.tar.gz:
Publisher:
python-publish.yml on radinhamidi/roksana
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
roksana-0.2.4.tar.gz -
Subject digest:
7bc20c51f8354bca2fc91e6720114ef0ae5bd619306c5941f260e1526725c400 - Sigstore transparency entry: 164635386
- Sigstore integration time:
-
Permalink:
radinhamidi/roksana@ccd598667fb84bc7232fe329b104265fb22be69f -
Branch / Tag:
refs/tags/v0.2.4 - Owner: https://github.com/radinhamidi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@ccd598667fb84bc7232fe329b104265fb22be69f -
Trigger Event:
release
-
Statement type:
File details
Details for the file ROKSANA-0.2.4-py3-none-any.whl.
File metadata
- Download URL: ROKSANA-0.2.4-py3-none-any.whl
- Upload date:
- Size: 32.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3635f241699e3b94e2ee565239d29ad32439325b13678c039669b7cabe02dac5
|
|
| MD5 |
f3dae5025206e1f5b8d768f60c5d71e6
|
|
| BLAKE2b-256 |
cdf810edcb0614dc8a0ba775c21befbf865076041d0278e5f17dd2e671e61da9
|
Provenance
The following attestation bundles were made for ROKSANA-0.2.4-py3-none-any.whl:
Publisher:
python-publish.yml on radinhamidi/roksana
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
roksana-0.2.4-py3-none-any.whl -
Subject digest:
3635f241699e3b94e2ee565239d29ad32439325b13678c039669b7cabe02dac5 - Sigstore transparency entry: 164635389
- Sigstore integration time:
-
Permalink:
radinhamidi/roksana@ccd598667fb84bc7232fe329b104265fb22be69f -
Branch / Tag:
refs/tags/v0.2.4 - Owner: https://github.com/radinhamidi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@ccd598667fb84bc7232fe329b104265fb22be69f -
Trigger Event:
release
-
Statement type: