Skip to main content

A Python Library to process wiki dumps xml.

Project description

wikixml

A Python Library to process wiki dumps xml.

Install

pip install wikixml --upgrade

Usage

WikiXmlParser

Run example:

python example.py

See: example.py

from wikixml import WikiXmlParser

if __name__ == "__main__":
    wiki_xml_bz2 = "zhwiki-20241101-pages-meta-current.xml.bz2"
    file_path = Path(__file__).parent / "data" / wiki_xml_bz2
    parser = WikiXmlParser(file_path)
    # parser.preview_lines(5000)
    parser.preview_pages(max_pages=100)

WikiPagesMongoWriter

Extract wiki pages from XML and write to MongoDB

python -m wikixml.mongo -d zhwiki -f "../data/zhwiki-latest-pages-meta-current.xml.bz2"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikixml-0.2.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

wikixml-0.2-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file wikixml-0.2.tar.gz.

File metadata

  • Download URL: wikixml-0.2.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for wikixml-0.2.tar.gz
Algorithm Hash digest
SHA256 732321f38e9442da1ab6bec2e13b5d6de414d665e2387ed736a14da7d54c987e
MD5 6fb0a4bc24e7c76451e5ab331baa0165
BLAKE2b-256 d77c53800cdcf1973922f751f5b0366bd8b732fd7e9354580756844f9e60e7c2

See more details on using hashes here.

File details

Details for the file wikixml-0.2-py3-none-any.whl.

File metadata

  • Download URL: wikixml-0.2-py3-none-any.whl
  • Upload date:
  • Size: 6.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for wikixml-0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cd821690ab716d847f44d0ef6baffac007fdc7801069d78423884a1327a3fb86
MD5 893f8d433a31e136163ceb4043eb24b5
BLAKE2b-256 2df71049d3f558a1eaf478b0f9a1497831847673a6643bfcdfc9c5695b1b2726

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page