Skip to main content

Analyzing scripts for Review Graph Mining Project.

Project description

GPLv3 Build Status wercker status Release

Logo

This package provides useful scripts to analyze datasets themselves and run an method for mining review graphs.

Installation

Use pip to install this package.

$ pip install --upgrade rgmining-script

dataset command

dataset command provides a set of functions to inspect a dataset. Those functions are divided to two groups, analyzing reviewer information and analyzing product information.

Analyzing reviewer information

To analyze reviewer information of a dataset, dataset command provides the following subcommands:

  • retrieve: output the ID of reviewers who review at least one of the given products,

  • active: output the ID of reviewers who review at least threshold items,

  • reviewer_size: output the number of reviews of each reviewer who reviews target products,

  • filter: output reviews posted by reviewers whose IDs match the given set of IDs.

Analyzing product information

To analyze product information of a dataset, dataset command provides the following subcommands:

  • average: output average rating scores of each product,

  • distinct: output distinct product IDs,

  • popular: output ID of products of which the member of reviews >= threshold.

  • filter: output reviews posted to products of which IDs match the given set of IDs.

  • variance: output variances of reviews for each product.

Basic usage

The basic usage of this command is

$ dataset <dataset-specifier> <dataset-parameters> reviewer <subcommand>

or

$ dataset <dataset-specifier> <dataset-parameters> product <subcommand>

where the dataset-specifier is a name of the dataset to be analyzed. It is depended on which libraries you have installed and dataset -h returns a list of available dataset names.

dataset-parameters are optional arguments specified with --dataset-param flag. The --dataset-param flag takes a string which connecting key and value with a single =. The --dataset-param flag can be given multi-times. You can find what kinds of parameter keys are defined in the dataset you want to use from documents of function load defined in the dataset.

For example, dataset file means loading a dataset from a file, of which each line contains a review in the JSON format. To load such dataset, use file as the dataset-specifier and give the file path as a dataset-parameter with file key, i.e. --dataset-param file="path/to/file".

See document site for more information about each subcommand.

analyze command

analyze command loads a dataset and run a method to find anomalous reviewers and compute a rating summary of each product.

The basic usage of this command is

$ analyze <dataset-specifier> <dataset-parameters> <method-specifier> <method-parameters>

The dataset-specifier and datasset-parameters are the same parameters described in the dataset command explanation.

The method-specifier is a name of installed method. You can see available method names by analyze -h.

method-parameters are optional arguments specified with --method-param flag. The --method-param flag takes a string which connecting key and value with a single =, and can be given multi-times.

You can find what kinds of parameter keys are defined in the method you want to run from documents of the constructor of the review graph object defined in the method.

For example, Fraud Eagle takes one parameter epsilon and you can give a value by --method-param epsilon=0.25.

See document site for more information.

License

This software is released under The GNU General Public License Version 3, see COPYING for more detail.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rgmining-script-0.6.1.tar.gz (64.5 kB view details)

Uploaded Source

File details

Details for the file rgmining-script-0.6.1.tar.gz.

File metadata

File hashes

Hashes for rgmining-script-0.6.1.tar.gz
Algorithm Hash digest
SHA256 5d77cad569e4257a7bf7ae5c4c2e90ac459e4059d25a34e355472f32cda85979
MD5 8d3617e1f7badb89f7ca25bc57d25556
BLAKE2b-256 a09fde949333e4e27d2e92a73c38d540b6b6dc18078f87247fafa0e6d18b6c03

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page