Skip to main content

A Python Library to process wiki dumps xml.

Project description

wikixml

A Python Library to process wiki dumps xml.

Install

pip install wikixml --upgrade

Usage

Run example:

python example.py

See: example.py

from wikixml import ZhWikiBz2Parser

if __name__ == "__main__":
    wiki_xml_bz2 = "zhwiki-20241101-pages-meta-current.xml.bz2"
    file_path = Path(__file__).parent / "data" / wiki_xml_bz2
    parser = ZhWikiBz2Parser(file_path)
    parser.preview_lines(100)
    # parser.preview_pages(max_pages=10000)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikixml-0.0.1.tar.gz (2.9 kB view details)

Uploaded Source

Built Distribution

wikixml-0.0.1-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file wikixml-0.0.1.tar.gz.

File metadata

  • Download URL: wikixml-0.0.1.tar.gz
  • Upload date:
  • Size: 2.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for wikixml-0.0.1.tar.gz
Algorithm Hash digest
SHA256 faa06b6783a89a84132655c6bc1a5f7a863a7da43ce38a3adab3622f10962080
MD5 4953c26c5214a6e5525ee49e329e8150
BLAKE2b-256 26fbe3c2080bbf27c9ca2ad78535c59e346d946a1c6289f60eb6c36e1ce11223

See more details on using hashes here.

File details

Details for the file wikixml-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: wikixml-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for wikixml-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ef906b1ce4cc7fbfacad1e7d06f7f42935f839e4c5a57429608af5f25d773190
MD5 5042470aae4879038f4156fdd89599b7
BLAKE2b-256 ab298fd06d6da527e7d07d15286f8b97e3d35a0e8938949b31b8c6a48524e039

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page