Skip to main content

A simple, purely python, WikiText parsing tool.

Project description

A simple, purely python, WikiText parsing tool.

Be warned that the project is in it’s early development stage. The API may change drastically and there may be some bugs. It sure can’t parse a page the exact same way as the MediaWiki does (because it’s completely offline and can’t expand templates and also has not yet implemented many details of MediaWiki parser…), but my guess is that for most usual cases the current capabilities will suffice.

Installation

Use pip install wikitextparser

Usage

Here is a short demo of some of the functionalities:

>>> import wikitextparser as wtp
>>> # wikitextparser can detect sections, parserfunctions, templates,
>>> # wikilinks, external links, arguments, and HTML comments in
>>> # your wikitext:
>>> wt = wtp.parse("""
== h2 ==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value1{{text|value2}}}}

[[A|B]]""")
>>>
>>> wt.templates
[Template('{{text|value2}}'), Template('{{text|value1{{text|value2}}}}')]
>>> wt.templates[1].arguments
[Argument("|value1{{text|value2}}")]
>>> wt.templates[1].arguments[0].value = 'value3'
>>> print(wt)

== h2 ==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value3}}

[[A|B]]
>>> # It provides easy-to-use properties so you can get or set
>>> # name or value of templates, arguments, wikilinks, etc.
>>> wt.wikilinks
[WikiLink("[[A|B]]")]
>>> wt.wikilinks[0].target = 'Z'
>>> wt.wikilinks[0].text = 'X'
>>> wt.wikilinks[0]
WikiLink('[[Z|X]]')
>>>
>>> from pprint import pprint
>>> pprint(wt.sections)
[Section('\n'),
 Section('== h2 ==\nt2\n\n=== h3 ===\nt3\n\n'),
 Section('=== h3 ===\nt3\n\n'),
 Section('== h22 ==\nt22\n\n{{text|value3}}\n\n[[Z|X]]')]
>>>
>>> wt.sections[1].title = 'newtitle'
>>> print(wt)

==newtitle==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value3}}

[[Z|X]]
>>> # There is a pprint function that you might find useful:
>>> p = wtp.parse('{{t1 |b=b|c=c| d={{t2|e=e|f=f}} }}')
>>> t2, t1 = p.templates
>>> print(t2.pprint())
{{t2
    |e=e
    |f=f
}}
>>> print(t1.pprint())
{{t1
    |b=b
    |c=c
    |d={{t2
        |e=e
        |f=f
    }}
}}
>>> # If you are dealing with
>>> # [[Category:Pages using duplicate arguments in template calls]],
>>> # there are two functions that may be helpful:
>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_dup_args_safe()
>>> t
Template('{{t|a=b|a=a}}')
>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_first_of_dup_args()
>>> t
Template('{{t|a=a}}')
>>> # Have a look at test.py module for more details and probable pitfalls.
>>>

See also:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikitextparser-0.5.5.zip (27.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wikitextparser-0.5.5.win32.exe (157.7 kB view details)

Uploaded Source

File details

Details for the file wikitextparser-0.5.5.zip.

File metadata

  • Download URL: wikitextparser-0.5.5.zip
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for wikitextparser-0.5.5.zip
Algorithm Hash digest
SHA256 3d94e34b56b0dad3c943bb27e5edd580c3d8734c77fa47f842ef6dcc170e0335
MD5 22e6b1d18c7003144643a96834104a37
BLAKE2b-256 966566a4fc754874dba79f54b973687d4871450aafd3394f0f3472259d501c5a

See more details on using hashes here.

File details

Details for the file wikitextparser-0.5.5.win32.exe.

File metadata

File hashes

Hashes for wikitextparser-0.5.5.win32.exe
Algorithm Hash digest
SHA256 a5f1c6a922e5b25d83964ad7d2dfffe15c542a51be5b9301a049a3bcceff7bf6
MD5 8936cb7beed6d60552ec7b5c85581038
BLAKE2b-256 345043482ea3e2a40f8cb269dce3a0a714499d70cb9af8378a1656fb7ea885fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page