Network of wikipedia articles
Project description
WikiNet
This repository contains code for analysis used in Ju et al. (2020).
Getting started
- In the terminal,
git clone https://github.com/harangju/wikinet.git
cd wikinet
conda env create -f environment.yml
- Download anaconda.
conda activate wikinet
jupyter notebook
Data
Wikipedia XML dumps are available at https://dumps.wikimedia.org/enwiki. Only two files are required for reproduction: (1) enwiki-DATE-pages-articles-multistream.xml.bz2 and (2) enwiki-DATE-pages-articles-multistream-index.txt.bz2, where DATE is the date of the dump. Both files are multistreamed versions of the zipped files, which allow the user to access an article without unpacking the whole file. In this study, we used the archived zipped file from August 1, 2019, which is available here.
Other options
gensim
added aWikiCorpus
class that parses through Wikipedia dumps.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wikinet-0.0.7.tar.gz
(13.8 kB
view hashes)
Built Distribution
wikinet-0.0.7-py3-none-any.whl
(12.8 kB
view hashes)