Skip to main content

Custom wikitext parser to produce html, plain text fields and relevant links from wikipedia page source code.

Project description

Wikitext Asymptote

Custom wikitext parser to produce html, plain text fields and relevant links from wikipedia page source code.

Get started

To install the wikitext_asymptote package, simply run

pip install wikitext_asymptote

You can then use the package as follows:

import wikitext_asymptote as wa

# Raw wikitext goes here
page = """..."""

# Parse it
parsed_page = wa.parse_page(page)

# parsed_page contains fields for html, plain text or links
print(parsed_page)

# parsed_page = {
#     'html': '...',
#     'text': '...',
#     'opening_text': '...',
#     'auxiliary_text': [...],
#     'heading': [...],
#     'links': [...]
# }

About

This package was created to fulfill the precise needs for parsing wikitext in the context of the EPFL Graph project, which are a bit more involved than the defaults from mwparserfromhell.

However, the task of parsing wikitext taking into account all its syntax is gigantic. Already deciding, not even implementing, the parsing of each template in the myriad that's available, with all its variations, is virtually unfeasible.

Hence, the approach we have decided to commit to is that of approximation (hence Asymptote). We try to parse a number of templates, tags and other entities as correctly as possible, in a way that most cases are covered. In addition, we also have defaults, which may or may not be adequate for some cases. This implies that the parsed output is not perfect, and that is by design. From that point on, we work on a case-by-case basis to include parsing for new templates, tags or other entities, should the need arise.

Acknowledgements

Wikitext asymptote is built on top of the mwparserfromhell package, so we acknowledge and are grateful for the work of their creators.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikitext_asymptote-0.0.4.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wikitext_asymptote-0.0.4-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file wikitext_asymptote-0.0.4.tar.gz.

File metadata

  • Download URL: wikitext_asymptote-0.0.4.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for wikitext_asymptote-0.0.4.tar.gz
Algorithm Hash digest
SHA256 faa8ef902a37b7ea0eb0349367d372949080d95e4f7bcde5bebf879e7e882cd1
MD5 e77ff0108b8beed88c8e618bad9cf2b2
BLAKE2b-256 e99d31d5d7a04670f4cab3dc6b0cc7be39d69959a49fa89aa9ad27d87eab38df

See more details on using hashes here.

File details

Details for the file wikitext_asymptote-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for wikitext_asymptote-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3c44be141ebd556a507827f139d478528d22d2206f15989e9492825e356b4f30
MD5 17bc3c6f3007f11916e4121af7d2fa8a
BLAKE2b-256 1515ab00f448a90f3d69c8d74a009d4b56db0a3f0f35c6360a8c3910d1ed1b9c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page