Skip to main content

Utility that converts Wikipedia pages into GitHub-flavored Markdown.

Project description

GoodWiki

GoodWiki is a Python package that carefully converts Wikipedia pages into GitHub-flavored Markdown. Converted pages preserve layout features like lists, code blocks, math, and block quotes.

This package is used to generate the GoodWiki Dataset.

Installation

This package supports Python 3.11+.

  1. Install via pip.
pip install goodwiki
  1. Install pandoc v2.19.2. Follow instructions here.

Usage

Initializing Client

import asyncio
from goodwiki import GoodwikiClient

client = GoodwikiClient()

You can also optionally provide your own user agent (default is goodwiki/1.0 (https://euirim.org)):

client = GoodwikiClient("goodwiki/1.0 (bob@gmail.com)")

Getting Single Page

page = asyncio.run(client.get_page("Usain Bolt"))

You can also optionally include styling syntax like bolding to the final markdown:

page = asyncio.run(client.get_page("Usain Bolt", with_styling=True))

You can access the resulting data via properties. For example:

print(page.markdown)

Getting Category Pages

To get a list of page titles associated with a Wikipedia category, run the following:

client.get_category_pages("Category:Good_articles")

Converting Existing Raw Wikitext

If you've already downloaded raw wikitext from Wikipedia, you can convert it to Markdown by running:

client.get_page_from_wikitext(
	raw_wikitext="RAW_WIKITEXT",
	# The rest of the fields are meant for populating the final WikiPage object
	title="Usain Bolt",
	pageid=123,
	revid=123,
)

Methodology

Full details are available in this package's GitHub repo README.

External Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

goodwiki-1.0.1.tar.gz (31.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

goodwiki-1.0.1-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file goodwiki-1.0.1.tar.gz.

File metadata

  • Download URL: goodwiki-1.0.1.tar.gz
  • Upload date:
  • Size: 31.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.6 Linux/5.15.0-83-generic

File hashes

Hashes for goodwiki-1.0.1.tar.gz
Algorithm Hash digest
SHA256 ba44a79803dfab5e37e2cded4c649e6bf9c3466114b77650853c038507fb295a
MD5 081d8c1a5ff71c1adc17f977ebacab89
BLAKE2b-256 e57f001314e0ecb375c9f493d379e181767ed4339e16237126b8b945d0f733c1

See more details on using hashes here.

File details

Details for the file goodwiki-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: goodwiki-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.6 Linux/5.15.0-83-generic

File hashes

Hashes for goodwiki-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 41d4152361bb7a652ab46b421605c324d8239c241f4893a302f958154bbcefd3
MD5 8efdc6277f15fbaf1683e545486e3154
BLAKE2b-256 5a3896ba0f3f2f9c062e2bae674f250eb511a4e8c936bf3716d59097440a615c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page