Utility that converts Wikipedia pages into GitHub-flavored Markdown.
Project description
GoodWiki
GoodWiki is a Python package that carefully converts Wikipedia pages into GitHub-flavored Markdown. Converted pages preserve layout features like lists, code blocks, math, and block quotes.
This package is used to generate the GoodWiki Dataset.
Installation
This package supports Python 3.11+.
- Install via pip.
pip install goodwiki
- Install pandoc v2.19.2. Follow instructions here.
Usage
Initializing Client
import asyncio
from goodwiki import GoodwikiClient
client = GoodwikiClient()
You can also optionally provide your own user agent (default is goodwiki/1.0 (https://euirim.org)):
client = GoodwikiClient("goodwiki/1.0 (bob@gmail.com)")
Getting Single Page
page = asyncio.run(client.get_page("Usain Bolt"))
You can also optionally include styling syntax like bolding to the final markdown:
page = asyncio.run(client.get_page("Usain Bolt", with_styling=True))
You can access the resulting data via properties. For example:
print(page.markdown)
Getting Category Pages
To get a list of page titles associated with a Wikipedia category, run the following:
client.get_category_pages("Category:Good_articles")
Converting Existing Raw Wikitext
If you've already downloaded raw wikitext from Wikipedia, you can convert it to Markdown by running:
client.get_page_from_wikitext(
raw_wikitext="RAW_WIKITEXT",
# The rest of the fields are meant for populating the final WikiPage object
title="Usain Bolt",
pageid=123,
revid=123,
)
Methodology
Full details are available in this package's GitHub repo README.
External Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file goodwiki-1.0.1.tar.gz.
File metadata
- Download URL: goodwiki-1.0.1.tar.gz
- Upload date:
- Size: 31.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.6 Linux/5.15.0-83-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba44a79803dfab5e37e2cded4c649e6bf9c3466114b77650853c038507fb295a
|
|
| MD5 |
081d8c1a5ff71c1adc17f977ebacab89
|
|
| BLAKE2b-256 |
e57f001314e0ecb375c9f493d379e181767ed4339e16237126b8b945d0f733c1
|
File details
Details for the file goodwiki-1.0.1-py3-none-any.whl.
File metadata
- Download URL: goodwiki-1.0.1-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.6 Linux/5.15.0-83-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41d4152361bb7a652ab46b421605c324d8239c241f4893a302f958154bbcefd3
|
|
| MD5 |
8efdc6277f15fbaf1683e545486e3154
|
|
| BLAKE2b-256 |
5a3896ba0f3f2f9c062e2bae674f250eb511a4e8c936bf3716d59097440a615c
|