Skip to main content

A simple, purely python, WikiText parsing tool.

Project description

A simple, purely python, WikiText parsing tool.

The purpose is to allow users easily extract and/or manipulate templates, template parameters, parser functions, tables, external links, wikilinks, etc. in wikitexts.

Installation

Use pip install wikitextparser

Usage

Here is a short demo of some of the functionalities:

>>> import wikitextparser as wtp
>>> # wikitextparser can detect sections, parserfunctions, templates,
>>> # wikilinks, external links, arguments, and HTML comments in
>>> # your wikitext:
>>> wt = wtp.parse("""
== h2 ==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value1{{text|value2}}}}

[[A|B]]""")
>>>
>>> wt.templates
[Template('{{text|value2}}'), Template('{{text|value1{{text|value2}}}}')]
>>> wt.templates[1].arguments
[Argument("|value1{{text|value2}}")]
>>> wt.templates[1].arguments[0].value = 'value3'
>>> print(wt)

== h2 ==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value3}}

[[A|B]]
>>> # It provides easy-to-use properties so you can get or set
>>> # name or value of templates, arguments, wikilinks, etc.
>>> wt.wikilinks
[WikiLink("[[A|B]]")]
>>> wt.wikilinks[0].target = 'Z'
>>> wt.wikilinks[0].text = 'X'
>>> wt.wikilinks[0]
WikiLink('[[Z|X]]')
>>>
>>> from pprint import pprint
>>> pprint(wt.sections)
[Section('\n'),
 Section('== h2 ==\nt2\n\n=== h3 ===\nt3\n\n'),
 Section('=== h3 ===\nt3\n\n'),
 Section('== h22 ==\nt22\n\n{{text|value3}}\n\n[[Z|X]]')]
>>>
>>> wt.sections[1].title = 'newtitle'
>>> print(wt)

==newtitle==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value3}}

[[Z|X]]
>>> # There is a pprint function that pretty-prints templates.
>>> p = wtp.parse('{{t1 |b=b|c=c| d={{t2|e=e|f=f}} }}')
>>> t2, t1 = p.templates
>>> print(t2.pprint())
{{t2
    |e=e
    |f=f
}}
>>> print(t1.pprint())
{{t1
    |b=b
    |c=c
    |d={{t2
        |e=e
        |f=f
    }}
}}
>>> # If you are dealing with
>>> # [[Category:Pages using duplicate arguments in template calls]],
>>> # there are two functions that may be helpful:
>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_dup_args_safe()
>>> t
Template('{{t|a=b|a=a}}')
>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_first_of_dup_args()
>>> t
Template('{{t|a=a}}')
>>> # Extract cell values of a table
>>> p = wtp.parse("""{|
|  Orange    ||   Apple   ||   more
|-
|   Bread    ||   Pie     ||   more
|-
|   Butter   || Ice cream ||  and more
|}""")
>>> pprint(p.tables[0].rows)
[['Orange', 'Apple', 'more'],
 ['Bread', 'Pie', 'more'],
 ['Butter', 'Ice cream', 'and more']]
>>> # Have a look at test modules for more details and probable pitfalls.
>>>

See also:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikitextparser-0.5.6.dev0.zip (36.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wikitextparser-0.5.6.dev0.win32.exe (172.9 kB view details)

Uploaded Source

File details

Details for the file wikitextparser-0.5.6.dev0.zip.

File metadata

File hashes

Hashes for wikitextparser-0.5.6.dev0.zip
Algorithm Hash digest
SHA256 1042cd29cd8a4ab729f2d99b32984f50afcced4a1e5132eadb919ed6d4872a36
MD5 d4fb2486a048f7f50041b3d43ae93ed4
BLAKE2b-256 6b00d1455dc49f83ada8dc3deb7845b649461c603b99fa2a1e98811901ad21e4

See more details on using hashes here.

File details

Details for the file wikitextparser-0.5.6.dev0.win32.exe.

File metadata

File hashes

Hashes for wikitextparser-0.5.6.dev0.win32.exe
Algorithm Hash digest
SHA256 ae145412e67ccd0f40bac2eaacdd32fdda9b3271686c4f1094b46ba64ac21949
MD5 40ee12042ebaee08b9ee07bb5e4b0370
BLAKE2b-256 8bf01240359b4d6ff5a244383cba7f16e7928ffd8fa4a03126ee26aaa5285d5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page