A package for analyzing Wikipedia deletion discussions.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Wide-Analysis : Suite for wikipedia deletion discussion analysis

Introduction

Wide-Analysis is a suite of tools for analyzing Wikipedia deletion discussions. It is designed to help researchers and practitioners to understand the dynamics of deletion discussions, and to develop tools for supporting the decision-making process in Wikipedia. The suite includes a set of tools for collecting, processing, and analyzing deletion discussions. The package contains the following functionalities

Data Collection and preprcoessing: Collecting deletion discussions from Wikipedia and prepare a dataset. This can be done in article title level, or in date-range level.
Model based functionalities: The suite includes a set of Language Model based tasks, such as:
- Outcome Prediction: Predicting the outcome of a deletion discussion, the outcome can be the decision made with the discussion (e.g., keep, delete, merge, etc.) (determined from the complete discussion)
- Stance Detection: Identifying the stance of the participants in the discussion, in relation to the deletion decision.(determined from each individual comment in discussion)
- Policy Prediction: Predicting the policy that is most relevant to the comments of the participants in the discussion.(determined from each individual comment in discussion)
- Sentiment Prediction: Predicting the sentiment of the participants in the discussion, in relation to the deletion decision.(determined from each individual comment in discussion)
- Offensive Language Detection: Detecting offensive language in the comments of the participants in the discussion.(determined from each individual comment in discussion)

Get started 🚀

You can install the package from PyPI using the following command:

pip install wide-analysis

After the installation, you can import the package and start using the functionalities.

Create dataset

The dataset creation funtionalities will return a dataframe. The data collection command contains the following parameters:

mode : str
- The mode of data collection. It can be 'article', 'date_range', or 'date' or 'existing'.
start_date : str
- The start date of the data collection. It should be in the format 'YYYY-MM-DD'(for example, '2021-01-01').
end_date : str
- The end date of the data collection. It should be in the format 'YYYY-MM-DD'(for example, '2021-01-01'). If left empty, the data collection will be done for a single date(start_date).
url : str (optional)
- The URL of the Wikipedia deletion discussion log page. Only needed for title based extraction. for example: https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2021_January_1
title : str
- The title of the Wikipedia article. only needed for title based extraction. for example: 'COVID-19_pandemic_in_India'
output_path : str
- The path to save the dataset.The dataset will be saved as 'csv' file. If not provided, the dataset will be returned as a dataframe.

Creation of dataset can be done in four ways:

Wide-analysis Dataset: If selected 'wide_2023' as mode parameter, then the data will be collected from the existing Wide-analysis dataset available in huggingface('hsuvaskakoty/wide_analysis') and the function will return huggingface dataset.

from wide_analysis import data_collect
data = data_collect.collect(mode = 'wide_2023', 
                            start_date=None, 
                            end_date=None, 
                            url=None, 
                            title=None, 
                            output_path=None)

Example: To collect the existing 'wide_2023' dataset, the following command can be used:

from wide_analysis import data_collect
data = data_collect.collect(mode = 'wide_2023', 
                            start_date=None, 
                            end_date=None, 
                            url=None, 
                            title=None, 
                            output_path=None)

will return the existing dataset available in huggingface('hsuvaskakoty/wide_analysis').

Datset loaded successfully as huggingfaece dataset
The dataset has the following columns: {'train': ['text', 'label'], 'validation': ['text', 'label'], 'test': ['text', 'label']}

Article level: Collecting deletion discussions for a specific article.

from wide_analysis import data_collect
data = data_collect.collect(mode = 'title', 
                            start_date='YYYY-MM-DD', 
                            end_date=None, 
                            url='URL for the title', 
                            title='article title', 
                            output_path='save_path' or None)

Example: To collect the deletion discussions for the article 'Raisul Islam Ador' for the date '2024-07-18', the following command can be used:

from wide_analysis import data_collect
data = data_collect.collect(mode = 'title', 
                            start_date='2024-07-18', 
                            end_date=None, 
                            url='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador', 
                            title='Raisul Islam Ador', 
                            output_path= None)

This will return a dataframe with the data for the title 'Raisul Islam Ador' for the date '2024-07-18'. If the output_path is provided, the dataframe will be saved as a csv file in the provided path. The output looks like the following:

Date	Title	URL	Discussion	Label	Confirmation
2024-07-18	Raisul Islam Ador	URL to article text	Deletion discussion here	speedy delete	Please do not modify it.

Date range level: Collecting deletion discussions for a specific date range.

from wide_analysis import data_collect
data = data_collect.collect(mode = 'date_range', 
                            start_date='YYYY-MM-DD', 
                            end_date='YYYY-MM-DD', 
                            url=None, 
                            title=None, 
                            output_path='save_path' or None)

Example: To collect the deletion discussions for the articles within the date range '2024-07-18' and '2024-07-20', the following command can be used:

from wide_analysis import data_collect
data = data_collect.collect(mode = 'date_range', 
                            start_date='2024-07-18', 
                            end_date='2024-07-20', 
                            url=None, 
                            title=None, 
                            output_path= None)

This will return a dataframe with the data for the articles within the date range '2024-07-18' and '2024-07-20'. The output looks like the same format as the article level data collection, just with more rows for each date within the date range.

Date level: Collecting deletion discussions for a specific date.

from wide_analysis import data_collect
data = data_collect.collect(mode = 'date', 
                            start_date='YYYY-MM-DD', 
                            end_date=None, 
                            url=None, 
                            title=None, 
                            output_path= None)

Example: To collect the deletion discussions for the articles within the date '2024-07-18', the following command can be used:

from wide_analysis import data_collect
data = data_collect.collect(mode = 'date', 
                            start_date='2024-07-18', 
                            end_date=None, 
                            url=None, 
                            title=None, 
                            output_path= None)

This will return a dataframe with the data for the articles within the date '2024-07-18'. The output looks like the same format as the article level data collection, just with more rows for each article within the date.

Model based functionalities

We train a set of models and leverage some pretrained task based models from huggingface for the following tasks: Outcome Prediction, Stance Detection, Policy Prediction, Sentiment Prediction, and Offensive Language Detection. The functionalities will return a dictionary, with the predictions for each task and their individual probablity score. The model based functionalities contain the following parameters:

inp: 'str'
- The url or text of the Wikipedia article deletion discussion.
mode: 'str'
- The mode of the input. it can be 'url' or 'text'. If 'url' is selected, the input should be the URL of the Wikipedia article deletion discussion. If 'text' is selected, the input should be the text of the Wikipedia article deletion discussion in the following format: Title: Deletion discussion Text where Title is the title of the article and Text is the deletion discussion. Default is 'url'.
task: 'str'
- The task to be performed. It can be 'outcome', 'stance', 'policy', 'sentiment', or 'offensive'.

It is worth noting that the model based functionalities are only available for the article level data collection. We also provide an explanation feature for outcome prediction task, which will return the explanation of the prediction made by the model using Openai GPT 3.5 model. You will need your own API key for this feature to work.

Outcome Prediction

Apart from the input parameters, the outcome prediction function also contains the following parameters:

openai_access_token: 'str'
- The API key for Openai GPT 4o-mini model. If explanation is True, then it will ask for the API key for Openai GPT 4o-mini model. Default is None.
explanation: 'bool'
- If True, it will return the explanation of the prediction made by the model. Default is False.

from wide_analysis import analyze
predictions = analyze(inp='URL/text of the article',
                    mode='url or text',
                     task='outcome',
                     openai_access_token=None,
                     explanation=False)

Example: To predict the outcome of the deletion discussion for the article 'Raisul Islam Ador' using discussion url, the following command can be used:

from wide_analysis import analyze
predictions = analyze(inp='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador',
                mode= 'url', 
                task='outcome',
                openai_access_token=None,
                explanation=False)

OR if using text:

from wide_analysis import analyze
text_input = 'Raisul Islam Ador: None establish his Wikipedia:Notability. The first reference is almost identical in wording to his official web site.CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.' #sample input text
predictions = analyze(inp=text_input, 
                    mode= 'text', 
                    task='outcome', 
                    openai_access_token=None, 
                    explanation=False)

Both of which will return the following output:

{'prediction': 'speedy delete', 'probability': 0.99}

To predict the outcome of the deletion discussion for the article 'Raisul Islam Ador' with explanation, the following command can be used:

from wide_analysis import analyze
predictions = analyze(inp='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador',
                    mode='url', 
                    task='outcome',
                    openai_access_token='<OPENAI KEY>',
                    explanation=True)

Returns:

{'prediction': 'speedy delete', 
'probability': 0.99, 
'explanation': 'The article does not establish the notability of the subject. The references are not reliable and the article is not well written. '}

Stance Detection

from wide_analysis import analyze
predictions = analyze.analyze(inp='URL/text of the article',mode='url or text', task='stance')

Example: To predict the stance of the participants in the deletion discussion for the article 'Raisul Islam Ador', the following command can be used:

from wide_analysis import analyze
predictions = analyze(inp='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador',mode = 'url', task='stance')

OR if using text:

from wide_analysis import analyze
text_input = 'Raisul Islam Ador: None establish his Wikipedia:Notability. The first reference is almost identical in wording to his official web site.CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.' #sample input text
predictions = analyze(inp=text_input, mode= 'text', task='stance')

Both of which will return the following output:

[{'sentence': 'None establish his Wikipedia:Notability .  ', 'stance': 'delete', 'score': 0.9950249791145325}, 
{'sentence': 'The first reference is almost identical in wording to his official web site.  ', 'stance': 'delete', 'score': 0.7702090740203857}, 
{'sentence': 'CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.  ', 'stance': 'delete', 'score': 0.9993199110031128}]

Policy Prediction

from wide_analysis import analyze
predictions = analyze(inp='URL/text of the article',mode='url or text', task='policy')

Example: To predict the policy that is most relevant to the comments of the participants in the deletion discussion for the article 'Raisul Islam Ador', the following command can be used:

from wide_analysis import analyze
predictions = analyze(inp='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador',mode = 'url', task='policy')

OR if using text:

from wide_analysis import analyze
text_input = 'Raisul Islam Ador: None establish his Wikipedia:Notability. The first reference is almost identical in wording to his official web site.CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.' #sample input text
predictions = analyze(inp=text_input, mode= 'text', task='policy')

Both of which will return the following output:

[{'sentence': 'None establish his Wikipedia:Notability .  ', 'policy': 'Wikipedia:Notability', 'score': 0.8100407719612122}, 
{'sentence': 'The first reference is almost identical in wording to his official web site.  ', 'policy': 'Wikipedia:Notability', 'score': 0.6429345607757568}, 
{'sentence': 'CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.  ', 'policy': 'Wikipedia:Criteria for speedy deletion', 'score': 0.9400111436843872}]

Sentiment Prediction

from wide_analysis import analyze
predictions = analyze(inp='URL/text of the article',mode='url or text', task='sentiment')

Example: To predict the sentiment of the participants in the deletion discussion for the article 'Raisul Islam Ador' with url, the following command can be used:

from wide_analysis import analyze
predictions = analyze(inp='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador',mode='url' task='sentiment')

OR if using text:

from wide_analysis import analyze
text_input = 'Raisul Islam Ador: None establish his Wikipedia:Notability. The first reference is almost identical in wording to his official web site.CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.' #sample input text
predictions = analyze(inp=text_input, mode= 'text', task='sentiment')

Both of which will return the following output:

[{'sentence': 'None establish his Wikipedia:Notability .  ', 'sentiment': 'negative', 'score': 0.515991747379303},
 {'sentence': 'The first reference is almost identical in wording to his official web site.  ', 'sentiment': 'neutral', 'score': 0.9082792401313782}, 
 {'sentence': 'CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.  ', 'sentiment': 'neutral', 'score': 0.8958092927932739}, ]

Offensive Language Detection

from wide_analysis import analyze
predictions = analyze(inp='URL/text of the article',mode='url or text', task='offensive')

Example: To detect offensive language in the comments of the participants in the deletion discussion for the article 'Raisul Islam Ador', the following command can be used:

from wide_analysis import analyze
predictions = analyze(inp='https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2024_July_15#Raisul_Islam_Ador',mode='url', task='offensive')

OR if using text:

from wide_analysis import analyze
text_input = 'Raisul Islam Ador: None establish his Wikipedia:Notability. The first reference is almost identical in wording to his official web site.CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.' #sample input text
predictions = analyze(inp=text_input, mode= 'text', task='offensive')

Both of which will return the following output:

[{'sentence': 'None establish his Wikipedia:Notability .  ', 'offensive_label': 'non-offensive', 'score': 0.8752073645591736}, 
{'sentence': 'The first reference is almost identical in wording to his official web site.  ', 'offensive_label': 'non-offensive', 'score': 0.9004920721054077},
{'sentence': 'CambridgeBayWeather (solidly non-human), Uqaqtuq (talk) , Huliva 20:06, 15 July 2024 (UTC) [ reply ] Delete , if not a CSD under G11.  ', 'offensive_label': 'non-offensive', 'score': 0.9054554104804993}]

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

1.2.11

Apr 8, 2025

1.2.10

Apr 8, 2025

1.2.9

Apr 7, 2025

1.2.8

Apr 7, 2025

1.2.7

Apr 6, 2025

1.2.6

Apr 6, 2025

1.2.5

Apr 5, 2025

1.2.4

Jan 16, 2025

1.2.3

Jan 14, 2025

1.2.2

Jan 14, 2025

1.2.1

Jan 13, 2025

1.2.0

Jan 13, 2025

1.1.9

Jan 13, 2025

1.1.8

Jan 12, 2025

1.1.7

Dec 19, 2024

1.1.6

Dec 19, 2024

1.1.5

Dec 19, 2024

This version

1.1.4

Dec 18, 2024

1.1.3

Dec 18, 2024

1.1.2

Dec 18, 2024

1.1.1

Dec 16, 2024

1.1.0

Dec 16, 2024

1.0.9

Dec 16, 2024

1.0.8

Dec 16, 2024

1.0.7

Dec 15, 2024

1.0.6

Dec 15, 2024

1.0.5

Dec 15, 2024

1.0.4

Dec 14, 2024

1.0.3

Dec 14, 2024

1.0.2

Dec 14, 2024

1.0

Dec 14, 2024

0.3.3

Aug 4, 2024

0.3.2

Aug 4, 2024

0.3.1

Aug 4, 2024

0.3.0

Aug 4, 2024

0.2.9

Aug 1, 2024

0.2.8

Jul 31, 2024

0.2.7

Jul 31, 2024

0.2.6

Jul 31, 2024

0.2.5

Jul 31, 2024

0.2.4

Jul 31, 2024

0.2.3

Jul 31, 2024

0.2.2

Jul 29, 2024

0.2.1

Jul 29, 2024

0.2.0

Jul 29, 2024

0.1.2

Jul 29, 2024

0.1.1

Jul 29, 2024

0.1.0

Jul 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wide_analysis-1.1.4.tar.gz (37.7 kB view details)

Uploaded Dec 18, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wide_analysis-1.1.4-py3-none-any.whl (47.6 kB view details)

Uploaded Dec 18, 2024 Python 3

File details

Details for the file wide_analysis-1.1.4.tar.gz.

File metadata

Download URL: wide_analysis-1.1.4.tar.gz
Upload date: Dec 18, 2024
Size: 37.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for wide_analysis-1.1.4.tar.gz
Algorithm	Hash digest
SHA256	`f22ccb4d5f2d521907904c6477d77bc29349781ed9e6028851450bdafdf4575d`
MD5	`90dfe07fc5a55f4773dbbe0b92ed9037`
BLAKE2b-256	`f2d3a0da6696c9930977a582b515645d6433c5dee959b65de92eb651c0885d2b`

See more details on using hashes here.

File details

Details for the file wide_analysis-1.1.4-py3-none-any.whl.

File metadata

Download URL: wide_analysis-1.1.4-py3-none-any.whl
Upload date: Dec 18, 2024
Size: 47.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for wide_analysis-1.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a5965b9e5b0551cc69cdd5529c16edae786de03c7135cadcd16f5488a1df87db`
MD5	`7953d6e0dd249337aff0ee74b8b53908`
BLAKE2b-256	`ce4cf5d41fd343188fee7a0c5b67e6ab1184f97c110612a0ca91fdc0ffc146b4`

See more details on using hashes here.

wide-analysis 1.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Wide-Analysis : Suite for wikipedia deletion discussion analysis

Introduction

Get started 🚀

Create dataset

Model based functionalities

Outcome Prediction

Stance Detection

Policy Prediction

Sentiment Prediction

Offensive Language Detection

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes