Skip to main content

A simple, purely python, WikiText parsing tool.

Project description

A simple, purely python, WikiText parsing tool.

The project is still in early development stages and I’m not sure if it will ever succeed. It sure can’t parse a page the same way the MediaWiki does (for example because it’s completely offline and can’t expand templates and also has not implemented many details of MediaWiki parser), but my guess is that for most usual uses it will be enough.

Installation

Use pip install wikitextparser

Usage

Here is a short demo of some of the functionalities:

>>> import wikitextparser as wtp
>>> # wikitextparser can detect sections, parserfunctions, templates,
>>> # wikilinks, external links, arguments, and HTML comments in
>>> # your wikitext:
>>> wt = wtp.parse("""
== h2 ==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value1{{text|value2}}}}

[[A|B]]""")
>>>
>>> wt.templates
[Template('{{text|value2}}'), Template('{{text|value1{{text|value2}}}}')]
>>> wt.templates[1].arguments
[Argument("|value1{{text|value2}}")]
>>> wt.templates[1].arguments[0].value = 'value3'
>>> print(wt)

== h2 ==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value3}}

[[A|B]]
>>> # It provides easy-to-use properties so you can get or set
>>> # name or value of templates, arguments, wikilinks, etc.
>>> wt.wikilinks
[WikiLink("[[A|B]]")]
>>> wt.wikilinks[0].target = 'Z'
>>> wt.wikilinks[0].text = 'X'
>>> wt.wikilinks[0]
WikiLink('[[Z|X]]')
>>>
>>> from pprint import pprint
>>> pprint(wt.sections)
[Section('\n'),
 Section('== h2 ==\nt2\n\n=== h3 ===\nt3\n\n'),
 Section('=== h3 ===\nt3\n\n'),
 Section('== h22 ==\nt22\n\n{{text|value3}}\n\n[[Z|X]]')]
>>>
>>> wt.sections[1].title = 'newtitle'
>>> print(wt)

==newtitle==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value3}}

[[Z|X]]
>>> # There is a pprint function that you might find useful:
>>> p = wtp.parse('{{t1 |b=b|c=c| d={{t2|e=e|f=f}} }}')
>>> t2, t1 = p.templates
>>> print(t2.pprint())
{{t2
    |e=e
    |f=f
}}
>>> print(t1.pprint())
{{t1
    |b=b
    |c=c
    |d={{t2
        |e=e
        |f=f
    }}
}}
>>> # If you are dealing with
>>> # [[Category:Pages using duplicate arguments in template calls]],
>>> # there are two functions that may be helpful:
>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_dup_args_safe()
>>> t
Template('{{t|a=b|a=a}}')
>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_first_of_dup_args()
>>> t
Template('{{t|a=a}}')
>>> # Have look at test.py module for more details and probable pitfalls.
>>>

See also:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikitextparser-0.5.3.zip (23.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wikitextparser-0.5.3.win32.exe (215.8 kB view details)

Uploaded Source

File details

Details for the file wikitextparser-0.5.3.zip.

File metadata

  • Download URL: wikitextparser-0.5.3.zip
  • Upload date:
  • Size: 23.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for wikitextparser-0.5.3.zip
Algorithm Hash digest
SHA256 fe271bfafd6f4c9893168e0c7d0a10bfe560c11100187d3d67999f9939af1b76
MD5 d4f440060f1bc5c3e8aa864c204ca25d
BLAKE2b-256 262471b3df47c633d4ad4a8de204c5da586a95c412576b051f17a8c35a6cafa7

See more details on using hashes here.

File details

Details for the file wikitextparser-0.5.3.win32.exe.

File metadata

File hashes

Hashes for wikitextparser-0.5.3.win32.exe
Algorithm Hash digest
SHA256 5540b4d34e0ea4570bef2f66c826101161da9e4d7a260c30fa769d4547f5d193
MD5 b2c49f5fcd79d88ca6b162cd7cff668f
BLAKE2b-256 938e3f9d8a90599778495211e4b92759de04a9d816ea1313d1f89d0de598cc83

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page