Data Analysis for Wikipedia User data.
Project description
Introduction
This package implements log retrieval, metrics generation, and data analysis tools used by the Editor Engagement Experiment (E3) team at the Wikimedia Foundation. The modules herein will be used to perform the ETL and analysis operations necessary to process the experimental data generated from E3 projects.
Installation
wmf_user_metrics is packaged with distutils:
$ sudo pip install wmf_user_metrics
Once installed you will need to modify the configuration files. This can be found in the file settings.py under $site-packages-home$/e3_analysis/config. Within this file configure the connections dictionary to point to a replicated production MySQL instance containing the . The ‘db’ setting should be an instance which ‘user’ has write access to. If you are from outside the Wikimedia Foundation and do not have access to these credentials contact me at rfaulkner@wikimedi.org if you’d like to work with this package.
The template configuration file looks like the following:
# Project settings # ================ __home__ = '/Users/rfaulkner/' __project_home__ = ''.join([__home__, 'projects/E3_analysis/']) __web_home__ = ''.join([__project_home__, 'web_interface/']) __sql_home__ = ''.join([__project_home__, 'SQL/']) __server_log_local_home__ = ''.join([__project_home__, 'logs/']) __data_file_dir__ = ''.join([__project_home__, 'data/']) __web_app_module__ = 'web_interface' __system_user__ = 'rfaulk' # Database connection settings # ============================ connections = { 'slave': { 'user' : 'research', 'host' : '127.0.0.1', 'db' : 'staging', 'passwd' : 'xxxx', 'port' : 3307}, 'slave-2': { 'user' : 'rfaulk', 'host' : '127.0.0.1', 'db' : 'rfaulk', 'passwd' : 'xxxx', 'port' : 3307} }
Documentation
Once the installation is complete and the configuration has been set the modules can be imported into the Python environment. The available operational modules are the following:
src.etl.data_loader src.etl.aggregator src.etl.table_loader src.etl.log_parser src.etl.time_series_process_methods src.etl.wpapi src.metrics.blocks src.metrics.bytes_added src.metrics.live_account.pyc src.metrics.edit_count src.metrics.edit_rate src.metrics.live_account src.metrics.metrics_manager src.metrics.namespace_of_edits src.metrics.query_calls src.metrics.revert_rate src.metrics.survival src.metrics.time_to_threshold src.metrics.user_metric src.metrics.users src.utils.autovivification src.utils.multiprocessing_wrapper src.utils.record_type
- More complete docs can be found at:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file wmf_user_metrics-0.1.1.tar.gz
.
File metadata
- Download URL: wmf_user_metrics-0.1.1.tar.gz
- Upload date:
- Size: 38.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | caaf69aa2abcd9c299f1d582fc43131a3f36b83d536ab63e162d204174e1a1d4 |
|
MD5 | 14528b4a2fee1231a578280a3e128d35 |
|
BLAKE2b-256 | 1d1589fec023b44f2d9da77ed8176749afbc45c2615596ae1f72e1e58871550c |
File details
Details for the file wmf_user_metrics-0.1.1.macosx-10.7-intel.tar.gz
.
File metadata
- Download URL: wmf_user_metrics-0.1.1.macosx-10.7-intel.tar.gz
- Upload date:
- Size: 78.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72cfa5f011558a4d07ddeac5db7c5f3ba72c49a2b4efb64fdc3965f4721e68a0 |
|
MD5 | 7dac7ac7a1afebc4ef21d547f136a13d |
|
BLAKE2b-256 | 1b7ac7a0e86caff3a23152467a7815c1faf295184aeca2d500ef88d4f154975d |