A Python Library to process wiki dumps xml.
Project description
wikixml
A Python Library to process wiki dumps xml.
Install
pip install wikixml --upgrade
Usage
Run example:
python example.py
See: example.py
from wikixml import ZhWikiBz2Parser
if __name__ == "__main__":
wiki_xml_bz2 = "zhwiki-20241101-pages-articles.xml.bz2"
file_path = Path(__file__).parent / "data" / wiki_xml_bz2
parser = ZhWikiBz2Parser(file_path)
# parser.preview_lines(5000)
parser.preview_pages(max_pages=500)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wikixml-0.1.tar.gz
(3.2 kB
view details)
Built Distribution
wikixml-0.1-py3-none-any.whl
(3.8 kB
view details)
File details
Details for the file wikixml-0.1.tar.gz
.
File metadata
- Download URL: wikixml-0.1.tar.gz
- Upload date:
- Size: 3.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 687d3291910ed0a184c9bbf63be09d4fe43d1599cd5ffe748f38973233f26792 |
|
MD5 | 1cc43326540aadc1a6445e64689b1f53 |
|
BLAKE2b-256 | d478202059506ef476a02fd6d07ce3eeaefbda5c0735ab184313e2e3510d000f |
File details
Details for the file wikixml-0.1-py3-none-any.whl
.
File metadata
- Download URL: wikixml-0.1-py3-none-any.whl
- Upload date:
- Size: 3.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 17636d6b00bccbc505a3d6e1ece09d492f2e121fc75659ea2ebc277711ba7f3d |
|
MD5 | 79fbc90ba8f319a990cc53889c536c96 |
|
BLAKE2b-256 | 91d68155ec8d0676518dcbef969308ff9e1eb3dc22fd83b7eec0bfcabec33776 |