Wikipedia Analysis Toolkit
Project description
Knolml-Analysis-Package
The aim of this project is to do various types of analysis on knolml which can be used by a reseracher who is working on wikipedia data. This pip package supports Python3.
Analysis1: Controversy Analysis using wiki-links
To measure the relative controversy level of various wiki-links present in a wikipedia article.
Module: from kml_analysis_pkg import controversy
Input Format: controversy.run(input_file_name)
Example: controversy.run("sample.knolml")
Analysis2: Contributions of an author over a given period of time in a wikipedia article
To find contributions of an author in terms of words, sentences, bytes etc over a given period of time (given starting and ending dates)
Module: from kml_analysis_pkg import author_contribution
Input Format: author_contribution.run("sample.knolml", start_date, end_date, measure_option)
Date Format: YYYY-MM-DD
Measure Options: sentences/bytes/wikilinks/words
Example: author_contribution.run("sample.knolml", "2000-01-01", "2010-01-01", "words")
Analysis3: Ranking all the authors based on their contribution to a given paragraph
To rank all the authors of a wikipedia article based on their contribution to a particular paragraph present in the article. The paragraph will be given as input to the program.
Module: from kml_analysis_pkg import author_para_contribution
Input Format: author_para_contribution.run(input_file_name)
Example: author_para_contribution.run("sample.knolml")
Analysis4: Finding knowledge gaps in a wikipedia article
A wikipedia article represents knowledge about some related topics, like a wikipedia article on IIT Ropar may be talking about placements of IIT Ropar in a particular section. But, in this section there was no information regarding a new branch say Biotechnology which was newly introduced. So, can we write a Python program that can tell that the information regarding placements of Biotechnology is missing from the IIT Ropar wikipedia page? Or in general can we tell that there is a knowledge gap in a wikipedia article?
Steps to find external knowledge gaps:-
Module: from kml_analysis_pkg import external_gaps
Input Format: external_gaps.run(input_file_name, word_vector_model)
Word Vector Model: Pretrained model avaiable at https://github.com/parasKumarSahu/Knolml-Analysis/blob/master/Text-Segmentation/wrdvecs-text8.bin
Example: external_gaps.run("GeneralScience.txt", "wrdvecs-text8.bin")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for kml-analysis-parasKumarSahu-0.0.19.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | c885bb3417c476fd16f2745d3c6c95489aafed4914bc4d0e2755774a5a73ac85 |
|
MD5 | b0d5a4b3dce679ada97b475414d7dc52 |
|
BLAKE2b-256 | fa413ad3d58f70052669a5e55d2153c68d46c319efb9d64896c0b8219edbe4a8 |
Hashes for kml_analysis_parasKumarSahu-0.0.19-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff4e5cd46c6185db47e9fef6abe228b22ef699f29f7fc685bf0d69af1e6e3fe9 |
|
MD5 | 04f7162201789e83d9619442bb22255a |
|
BLAKE2b-256 | a945278b2230b253487e2054a5c1a7b20f037b00786ed9788000b90b2b769a28 |