pip install package for frankgraphbench.

These details have not been verified by PyPI

Project links

Homepage

Project description

FranKGraphBench: Knowledge Graph Aware Recommender Systems Framework for Benchmarking

The FranKGraphBench is a framework to allow KG Aware RSs to be benchmarked in a reproducible and easy to implement manner. It was first created on Google Summer of Code 2023 for Data Integration between DBpedia and some standard RS datasets in a reproducible framework.

Check the docs for more information.

This repository was first created for Data Integration between DBpedia and some standard Recommender Systems datasets and a framework for reproducible experiments. For more info, check the project proposal and the project progress with weekly (as possible) updates.

Data Integration Usage

pip

We recommend using a python 3.8 virtual environment

pip install pybind11
pip install frankgraphbench

Install the full dataset using bash scripts located at datasets/:

cd datasets
bash ml-100k.sh # Downloaded at `datasets/ml-100k` folder
bash ml-1m.sh   # Downloaded at `datasets/ml-1m` folder

Usage

data_integration [-h] -d DATASET -i INPUT_PATH -o OUTPUT_PATH [-ci] [-cu] [-cr] [-cs] [-map] [-w]

Arguments:

-h: Shows the help message.
-d: Name of a supported dataset. It will be the same name of the folder created by the bash script provided for the dataset. For now, check data_integration/dataset2class.py to see the supported ones.
-i: Input path where the full dataset is placed.
-o: Output path where the integrated dataset will be placed.
-ci: Use this flag if you want to convert item data.
-cu: Use this flag if you want to convert user data.
-cr: Use this flag if you want to convert rating data.
-cs: Use this flag if you want to convert social link data.
-map: Use this flag if you want to map dataset items with DBpedia. At least the item data should be already converted.
-w: Choose the number of workers(threads) to be used for parallel queries.

Usage Example:

data_integration -d 'ml-100k' -i 'datasets/ml-100k' -o 'datasets/ml-100k/processed' \
    -ci -cu -cr -map -w 8

source

Install the required packages using python virtualenv, using:

python3 -m venv venv_data_integration/
source venv_data_integration/bin/activate
pip3 install -r requirements_data_integration.txt

Install the full dataset using bash scripts located at datasets/:

cd datasets
bash ml-100k.sh # Downloaded at `datasets/ml-100k` folder
bash ml-1m.sh   # Downloaded at `datasets/ml-1m` folder

Usage

python3 data_integration.py [-h] -d DATASET -i INPUT_PATH -o OUTPUT_PATH [-ci] [-cu] [-cr] [-cs] [-map] [-w]

Arguments:

-h: Shows the help message.
-d: Name of a supported dataset. It will be the same name of the folder created by the bash script provided for the dataset. For now, check data_integration/dataset2class.py to see the supported ones.
-i: Input path where the full dataset is placed.
-o: Output path where the integrated dataset will be placed.
-ci: Use this flag if you want to convert item data.
-cu: Use this flag if you want to convert user data.
-cr: Use this flag if you want to convert rating data.
-cs: Use this flag if you want to convert social link data.
-map: Use this flag if you want to map dataset items with DBpedia. At least the item data should be already converted.
-w: Choose the number of workers(threads) to be used for parallel queries.

Usage Example:

python3 src/data_integration.py -d 'ml-100k' -i 'datasets/ml-100k' -o 'datasets/ml-100k/processed' \
    -ci -cu -cr -map -w 8

Check Makefile for more examples.

Supported datasets

Dataset	#items matched	#items
MovieLens-100k	1462	1681
MovieLens-1M	3356	3883
LastFM-hetrec-2011	11815	17632
Douban-Movie-Short-Comments-Dataset	---	28
Yelp-Dataset	---	150348
Amazon-Video-Games-5	---	21106

Framework for reproducible experiments usage

pip

We recommend using a python 3.8 virtual environment

pip install pybind11
pip install frankgraphbench

Usage

framework -c 'config_files/test.yml'

Arguments:

-c: Experiment configuration file path.

The experiment config file should be a .yaml file like this:

experiment:
  dataset: 
    name: ml-100k
    item:
      path: datasets/ml-100k/processed/item.csv 
      extra_features: [movie_year, movie_title] 
    user:
      path: datasets/ml-100k/processed/user.csv 
      extra_features: [gender, occupation] 
    ratings: 
      path: datasets/ml-100k/processed/rating.csv 
      timestamp: True
    enrich:
      map_path: datasets/ml-100k/processed/map.csv
      enrich_path: datasets/ml-100k/processed/enriched.csv
      remove_unmatched: False
      properties:
        - type: subject
          grouped: True
          sep: "::"
        - type: director
          grouped: True
          sep: "::"

  preprocess:
    - method: filter_kcore
      parameters:
        k: 20
        iterations: 1
        target: user

  split:
    seed: 42
    test:
      method: k_fold
      k: 2
      level: 'user'


  models:
    - name: deepwalk_based
      config:
        save_weights: True
      parameters:
        walk_len: 10
        p: 1.0
        q: 1.0
        n_walks: 50
        embedding_size: 64
        epochs: 1
  
  evaluation:
    k: 5
    relevance_threshold: 3
    metrics: [MAP, nDCG]

  report:
    file: 'experiment_results/ml100k_enriched/run1.csv'

See the config_files/ directory for more examples.

source

Install the require packages using python virtualenv, using:

python3 -m venv venv_framework/
source venv_framework/bin/activate
pip3 install -r requirements_framework.txt

Usage

python3 src/framework.py -c 'config_files/test.yml'

Arguments:

-c: Experiment configuration file path.

The experiment config file should be a .yaml file like this:

experiment:
  dataset: 
    name: ml-100k
    item:
      path: datasets/ml-100k/processed/item.csv 
      extra_features: [movie_year, movie_title] 
    user:
      path: datasets/ml-100k/processed/user.csv 
      extra_features: [gender, occupation] 
    ratings: 
      path: datasets/ml-100k/processed/rating.csv 
      timestamp: True
    enrich:
      map_path: datasets/ml-100k/processed/map.csv
      enrich_path: datasets/ml-100k/processed/enriched.csv
      remove_unmatched: False
      properties:
        - type: subject
          grouped: True
          sep: "::"
        - type: director
          grouped: True
          sep: "::"

  preprocess:
    - method: filter_kcore
      parameters:
        k: 20
        iterations: 1
        target: user

  split:
    seed: 42
    test:
      method: k_fold
      k: 2
      level: 'user'


  models:
    - name: deepwalk_based
      config:
        save_weights: True
      parameters:
        walk_len: 10
        p: 1.0
        q: 1.0
        n_walks: 50
        embedding_size: 64
        epochs: 1
  
  evaluation:
    k: 5
    relevance_threshold: 3
    metrics: [MAP, nDCG]

  report:
    file: 'experiment_results/ml100k_enriched/run1.csv'

See the config_files/ directory for more examples.

Chart generation for results usage

pip

We recommend using a python 3.8 virtual environment

pip install pybind11
pip install frankgraphbench

After obtaining results from some experiments

Usage

data_integration [-h] -c CHART -p PERFORMANCE_METRIC -o OUTPUT_PATH [-ci] [-cu] [-cr] [-cs] [-map] [-w]

Arguments:

-h: Shows the help message.
-p: Name of the performance metric within the file to use for chart generation.
-f: List of .csv files to use for generating the chart.
-i: Path where results data to generate chart is located in .csv files.
-o: Path where generated charts will be placed.
-n: Add a name (and file extension) to the chart that will be generated.

Usage Example:

chart_generation -c 'cd-diagram' -p 'MAP@5' -f "['ml-100k.csv', 'ml-1m.csv', 'lastfm.csv', 'ml-100k_enriched.csv', 'ml-1m_enriched.csv', 'lastfm_enriched.csv']" -i 'experiment_results' -o 'charts' -n 'MAP@5.pdf'

source

Install the required packages using python virtualenv, using:

python3 -m venv venv_chart_generation/
source venv_chart_generation/bin/activate
pip3 install -r requirements_chart_generation.txt

After obtaining results from some experiments

Usage

data_integration [-h] -c CHART -p PERFORMANCE_METRIC -o OUTPUT_PATH [-ci] [-cu] [-cr] [-cs] [-map] [-w]

Arguments:

-h: Shows the help message.
-p: Name of the performance metric within the file to use for chart generation.
-f: List of .csv files to use for generating the chart.
-i: Path where results data to generate chart is located in .csv files.
-o: Path where generated charts will be placed.
-n: Add a name (and file extension) to the chart that will be generated.

Usage Example:

python3 src/chart_generation.py -c 'cd-diagram' -p 'MAP@5' -f "['ml-100k.csv', 'ml-1m.csv', 'lastfm.csv', 'ml-100k_enriched.csv', 'ml-1m_enriched.csv', 'lastfm_enriched.csv']" -i 'experiment_results' -o 'charts' -n 'MAP@5.pdf'

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.1

Dec 18, 2024

This version

0.1.0

Jun 18, 2024

0.0.4a0 pre-release

May 29, 2024

0.0.3a0 pre-release

May 29, 2024

0.0.2a0 pre-release

May 29, 2024

0.0.1a0 pre-release

May 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

FranKGraphBench-0.1.0-py3-none-any.whl (75.3 kB view details)

Uploaded Jun 18, 2024 Python 3

File details

Details for the file FranKGraphBench-0.1.0-py3-none-any.whl.

File metadata

Download URL: FranKGraphBench-0.1.0-py3-none-any.whl
Upload date: Jun 18, 2024
Size: 75.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.11.5

File hashes

Hashes for FranKGraphBench-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`899d0f64e8d2bbde8d1adb3c3f1464e21d7ca1c8e15f96ef154b780354aa67f7`
MD5	`2e8f49beb45489dcc5dd51b7d74f9d39`
BLAKE2b-256	`35a7a238ffb2c373b489cfdcb3c42ea33e23298f67f4f6a567c86e07030b2de3`

See more details on using hashes here.

FranKGraphBench 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

FranKGraphBench: Knowledge Graph Aware Recommender Systems Framework for Benchmarking

Data Integration Usage

pip

Usage

source

Usage

Supported datasets

Framework for reproducible experiments usage

pip

Usage

source

Usage

Chart generation for results usage

pip

Usage

source

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes