Skip to main content

A package to mesure diversity of log files

Project description

LogDiv: A Python Module for Computing Diversity in Transaction Logs

LogDiv is a Python module for the computation of the diversity of items requested by users in transaction logs.

It takes two inputs:

  1. A log file with transactions.
  2. A file with item atributes.

Computing the diversity of items requested by users is a task of interest in many fields, such as sociology, recommender systems, e-commerce, and media studies. Check the example below.

Getting Started

Prerequisites

LogDiv requires:

  • Python
  • Numpy - Essential
  • Pandas - Essential
  • Matplotlib - Essential
  • pyyaml - Essential
  • scikit-learn - Essential
  • tqdm - Optional: progression bar, only one function requires it
  • Graph-tool - Optional: only one function requires it
$ python3 -m pip install numpy
$ python3 -m pip install panda
$ python3 -m pip install matplotlib 
$ python3 -m pip install pyyaml
$ python3 -m pip install scikit-learn
$ python3 -m pip install tqdm 

Installing Graph-tool is more complicated: https://git.skewed.de/count0/graph-tool/wikis/installation-instructions

Installing

To install LogDiv, you need to execute:

$ pip install logdiv

Specification

Inputs format

LogDiv needs a specific format of inputs to run:

  • A file describing all requests under a table format, whose fields are:
  • user ID
  • timestamp
  • requested item ID
  • referrer item ID
  • A file describing all pages visited under a table format, whose fields are:
  • item ID
  • classification 1
  • classification 2
  • ...

YAML file

Codes that use LogDiv are directed by a YAML file: if you want to modify input files, or the features you want to compute, you just need to modify the YAML file, not the code itself.

YAML file are similar to JSON file, once you load them, they take the form of a dictionnary. In your codes, you have for instance a function that take a parameter that need to be changed often. You can give to your function the key of the dictionnary, and then change the value in the YAML file. This allows to make less mistakes and take less time when you want to change parameters in your code.

Documentation

If you want precision on a function of LogDiv:

  • what is the purpose of the function
  • what these functions take in input
  • what they return

you need to run in a Console Python:

>>> help(function)

Examples

You dispose of two examples to familiarize yourself with LogDiv:

  • Example 1 uses a short dataset to show how to use LogDiv
  • Example 2 uses a dataset of more than 100 thousands of requests to show what kind of results can be obtained

These examples (dataset, script and yaml file) can be found in datasets directory. These YAML files are self-explanatory.

Example 1

The following example illustrates the inputs format of the package.

user timestamp requested_item referrer_item
user1 2019-07-03 00:00:00 v1 v4
user1 2019-07-03 00:01:00 v4 v2
user1 2019-07-03 00:01:10 v4 v6
user1 2019-07-03 00:01:20 v4 v6
user1 2019-07-03 00:02:00 v6 v9
user1 2019-07-03 03:00:00 v8 v10
user1 2019-07-03 03:01:00 v8 v5
user2 2019-07-05 12:00:00 v3 v5
user2 2019-07-05 12:00:30 v5 v7
user2 2019-07-05 12:00:45 v7 v9
user2 2019-07-05 12:01:00 v9 v6
user3 2019-07-05 18:00:00 v10 v5
user3 2019-07-05 18:01:15 v10 v7
user3 2019-07-05 18:03:35 v10 v9
user3 2019-07-05 18:06:00 v7 v4
user3 2019-07-05 18:07:22 v5 v2
item class1 class2 class3
v1 x \alpha h
v2 y \beta h
v3 y \beta f
v4 x \beta h
v5 z \gammma f
v6 y \alpha h
v7 z \alpha f
v8 x \gammma f
v9 y \alpha f
v10 z \gammma h

If you want to run example 1, you need to be in the directory datasets/example1, and run:

$ python3 example_1.py

Example 2

This figure is the Gephi graph of dataset 2, where each color correspond to a different media.

The file describing requests has the same structure than the one in example 1.

The file describing pages is more concrete than the one in example 1:

item media continent
item0 Politics Europe
item1 Health Asia
item2 Politics North America

If you want to run example 2, you need to be in the directory datasets/example2, and run:

$ python3 example_2.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

logdiv-0.0.2.7.tar.gz (24.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

logdiv-0.0.2.7-py3-none-any.whl (70.5 kB view details)

Uploaded Python 3

File details

Details for the file logdiv-0.0.2.7.tar.gz.

File metadata

  • Download URL: logdiv-0.0.2.7.tar.gz
  • Upload date:
  • Size: 24.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for logdiv-0.0.2.7.tar.gz
Algorithm Hash digest
SHA256 ecd73d500e1d13be983fa344a663893b0ceb929adea9fa658c0a1bda900b0926
MD5 26dd4a230e34f4b3d215950e646337b3
BLAKE2b-256 7f92c628f4f0f5481d151a2274fadfab2eb0e821d734d68eb18d6e3acde8ce5b

See more details on using hashes here.

File details

Details for the file logdiv-0.0.2.7-py3-none-any.whl.

File metadata

  • Download URL: logdiv-0.0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 70.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for logdiv-0.0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 c74387f0e2366f4e12fc326b8dc677b34276b7654e6bb6efe340b2c62c483491
MD5 a8dfb55a2abb6ae1484a5dfb5171294b
BLAKE2b-256 95b56f9df502df37718cc370d06b3b44db63595018da255c57e4b249dd76e253

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page