Skip to main content

Data Analysis for Wikipedia User data.

Project description

Introduction

This package implements log retrieval, metrics generation, and data analysis tools used by the Editor Engagement Experiment (E3) team at the Wikimedia Foundation. The modules herein will be used to perform the ETL and analysis operations necessary to process the experimental data generated from E3 projects.

Installation

wmf_user_metrics is packaged with distutils:

$ sudo pip install wmf_user_metrics

Once installed you will need to modify the configuration files. This can be found in the file settings.py under $site-packages-home$/e3_analysis/config. Within this file configure the connections dictionary to point to a replicated production MySQL instance containing the . The ‘db’ setting should be an instance which ‘user’ has write access to. If you are from outside the Wikimedia Foundation and do not have access to these credentials contact me at rfaulkner@wikimedi.org if you’d like to work with this package.

The template configuration file looks like the following:

# Project settings
# ================
__home__ = '/Users/rfaulkner/'
__project_home__ = ''.join([__home__, 'projects/E3_analysis/'])
__web_home__ = ''.join([__project_home__, 'web_interface/'])
__sql_home__ = ''.join([__project_home__, 'SQL/'])
__server_log_local_home__ = ''.join([__project_home__, 'logs/'])
__data_file_dir__ = ''.join([__project_home__, 'data/'])

__web_app_module__ = 'web_interface'
__system_user__ = 'rfaulk'

# Database connection settings
# ============================

connections = {
    'slave': {
        'user' : 'research',
        'host' : '127.0.0.1',
        'db' : 'staging',
        'passwd' : 'xxxx',
        'port' : 3307},
    'slave-2': {
        'user' : 'rfaulk',
        'host' : '127.0.0.1',
        'db' : 'rfaulk',
        'passwd' : 'xxxx',
        'port' : 3307}
}

Documentation

Once the installation is complete and the configuration has been set the modules can be imported into the Python environment. The available operational modules are the following:

src.etl.data_loader
src.etl.aggregator
src.etl.table_loader
src.etl.log_parser
src.etl.time_series_process_methods
src.etl.wpapi

src.metrics.blocks
src.metrics.bytes_added
src.metrics.live_account.pyc
src.metrics.edit_count
src.metrics.edit_rate
src.metrics.live_account
src.metrics.metrics_manager
src.metrics.namespace_of_edits
src.metrics.query_calls
src.metrics.revert_rate
src.metrics.survival
src.metrics.time_to_threshold
src.metrics.user_metric
src.metrics.users

src.utils.autovivification
src.utils.multiprocessing_wrapper
src.utils.record_type
More complete docs can be found at:

http://stat1.wikimedia.org/rfaulk/pydocs/_build/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wmf_user_metrics-0.1.1.tar.gz (38.6 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file wmf_user_metrics-0.1.1.tar.gz.

File metadata

File hashes

Hashes for wmf_user_metrics-0.1.1.tar.gz
Algorithm Hash digest
SHA256 caaf69aa2abcd9c299f1d582fc43131a3f36b83d536ab63e162d204174e1a1d4
MD5 14528b4a2fee1231a578280a3e128d35
BLAKE2b-256 1d1589fec023b44f2d9da77ed8176749afbc45c2615596ae1f72e1e58871550c

See more details on using hashes here.

File details

Details for the file wmf_user_metrics-0.1.1.macosx-10.7-intel.tar.gz.

File metadata

File hashes

Hashes for wmf_user_metrics-0.1.1.macosx-10.7-intel.tar.gz
Algorithm Hash digest
SHA256 72cfa5f011558a4d07ddeac5db7c5f3ba72c49a2b4efb64fdc3965f4721e68a0
MD5 7dac7ac7a1afebc4ef21d547f136a13d
BLAKE2b-256 1b7ac7a0e86caff3a23152467a7815c1faf295184aeca2d500ef88d4f154975d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page