Skip to main content
Join the official 2020 Python Developers SurveyStart the survey!

Wikipedia Analysis Toolkit

Project description


The aim of this project is to do various types of analysis on knolml which can be used by a reseracher who is working on wikipedia data.

Analysis1: Controversy Analysis using wiki-links

To measure the relative controversy level of various wiki-links present in a wikipedia article.

Input Format: python3 script_name input_file_name

Example: python3 2006_Westchester_County_torna.knolml

Analysis2: Contributions of an author over a given period of time in a wikipedia article

To find contributions of an author in terms of words, sentences, bytes etc over a given period of time (given starting and ending dates)

Input Format: python3 script_name input_file_name start_date(YYYY-MM-DD) end_date(YYYY-MM-DD) --flag(sentences/bytes/wikilinks/words)

Example: python3 2006_Westchester_County_torna.knolml 2000-01-01 2010-01-01 --bytes

Analysis3: Ranking all the authors based on their contribution to a given paragraph

To rank all the authors of a wikipedia article based on their contribution to a particular paragraph present in the article. The paragraph will be given as input to the program.

Input Format: python3 script_name input_file_name

Example: python3 2006_Westchester_County_torna.knolml

Analysis4: Finding knowledge gaps in a wikipedia article

A wikipedia article represents knowledge about some related topics, like a wikipedia article on IIT Ropar may be talking about placements of IIT Ropar in a particular section. But, in this section there was no information regarding a new branch say Biotechnology which was newly introduced. So, can we write a Python program that can tell that the information regarding placements of Biotechnology is missing from the IIT Ropar wikipedia page? Or in general can we tell that there is a knowledge gap in a wikipedia article?

Steps to find external knowledge gaps:-

  1. Select a book from books folder as input file for segmentation and run python3 books/[book_name]
  2. Segments would be written in segmentaion_result.csv file
  3. Now we will do external segmentation using segmentaion_result.csv, run python3
  4. You can find the External Knowledge gaps in external_gaps.txt file

Steps to train word2vec (Optional):-

  1. You are already provided with a trained word2vec (wrdvecs-text8.bin), you have to delete it first
  2. Once the trained model is deleted, supply a coprus with name text8 and simply run the code

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for kml-analysis-parasKumarSahu, version 0.0.3
Filename, size File type Python version Upload date Hashes
Filename, size kml_analysis_parasKumarSahu-0.0.3-py3-none-any.whl (4.7 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size kml-analysis-parasKumarSahu-0.0.3.tar.gz (3.3 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page