Skip to main content

Network of wikipedia articles

Project description

WikiNet

This repository contains code for analysis used in Ju et al. (2020).

Getting started

  1. In the terminal, git clone https://github.com/harangju/wikinet.git
  2. cd wikinet
  3. conda env create -f environment.yml
  4. conda activate wikinet
  5. jupyter notebook

Data

Wikipedia XML dumps are available at https://dumps.wikimedia.org/enwiki. Only two files are required for reproduction: (1) enwiki-DATE-pages-articles-multistream.xml.bz2 and (2) enwiki-DATE-pages-articles-multistream-index.txt.bz2, where DATE is the date of the dump. Both files are multistreamed versions of the zipped files, which allow the user to access an article without unpacking the whole file. In this study, we used the archived zipped file from August 1, 2019, which is available here.

Other options

  • gensim added a WikiCorpus class that parses through Wikipedia dumps.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikinet-0.0.7.tar.gz (13.8 kB view hashes)

Uploaded Source

Built Distribution

wikinet-0.0.7-py3-none-any.whl (12.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page