Skip to main content

A simple, purely python, WikiText parsing tool.

Project description

https://travis-ci.org/5j9/wikitextparser.svg?branch=master

wikitextparser

A simple, purely python, WikiText parsing tool.

The purpose is to allow users easily extract and/or manipulate templates, template parameters, parser functions, tables, external links, wikilinks, etc. in wikitexts.

Installation

Use pip install wikitextparser

Usage

Here is a short demo of some of the functionalities:

>>> import wikitextparser as wtp

WikiTextParser can detect sections, parserfunctions, templates, wikilinks, external links, arguments, tables, and HTML comments in your wikitext:

>>> wt = wtp.parse("""
== h2 ==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value1{{text|value2}}}}

[[A|B]]""")
>>>
>>> wt.templates
[Template('{{text|value2}}'), Template('{{text|value1{{text|value2}}}}')]
>>> wt.templates[1].arguments
[Argument("|value1{{text|value2}}")]
>>> wt.templates[1].arguments[0].value = 'value3'
>>> print(wt)

== h2 ==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value3}}

[[A|B]]

It provides easy-to-use properties so you can get or set names or values of templates, arguments, wikilinks, etc.:

>>> wt.wikilinks
[WikiLink("[[A|B]]")]
>>> wt.wikilinks[0].target = 'Z'
>>> wt.wikilinks[0].text = 'X'
>>> wt.wikilinks[0]
WikiLink('[[Z|X]]')
>>>
>>> from pprint import pprint
>>> pprint(wt.sections)
[Section('\n'),
 Section('== h2 ==\nt2\n\n=== h3 ===\nt3\n\n'),
 Section('=== h3 ===\nt3\n\n'),
 Section('== h22 ==\nt22\n\n{{text|value3}}\n\n[[Z|X]]')]
>>>
>>> wt.sections[1].title = 'newtitle'
>>> print(wt)

==newtitle==
t2

=== h3 ===
t3

== h22 ==
t22

{{text|value3}}

[[Z|X]]

There is a pprint function that pretty-prints templates:

>>> p = wtp.parse('{{t1 |b=b|c=c| d={{t2|e=e|f=f}} }}')
>>> t2, t1 = p.templates
>>> print(t2.pprint())
{{t2
    | e = e
    | f = f
}}
>>> print(t1.pprint())
{{t1
    | b = b
    | c = c
    | d = {{t2
        | e = e
        | f = f
    }}
}}

If you are dealing with [[Category:Pages using duplicate arguments in template calls]] there are two functions that may be helpful:

>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_dup_args_safe()
>>> t
Template('{{t|a=b|a=a}}')
>>> t = wtp.Template('{{t|a=a|a=b|a=a}}')
>>> t.rm_first_of_dup_args()
>>> t
Template('{{t|a=a}}')

Extracting cell values of a table is easy:

>>> p = wtp.parse("""{|
|  Orange    ||   Apple   ||   more
|-
|   Bread    ||   Pie     ||   more
|-
|   Butter   || Ice cream ||  and more
|}""")
>>> pprint(p.tables[0].getdata())
[['Orange', 'Apple', 'more'],
 ['Bread', 'Pie', 'more'],
 ['Butter', 'Ice cream', 'and more']]

And values are rearranged according to colspan and rowspan attributes (by default):

>>> t = wtp.Table("""{| class="wikitable sortable"
|-
! a !! b !! c
|-
!colspan = "2" | d || e
|-
|}""")
>>> t.getdata(span=True)
[['a', 'b', 'c'], ['d', 'd', 'e']]

Have a look at the test modules for more details and probable pitfalls.

See also:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikitextparser-0.7.5.zip (43.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wikitextparser-0.7.5.win32.exe (180.3 kB view details)

Uploaded Source

File details

Details for the file wikitextparser-0.7.5.zip.

File metadata

  • Download URL: wikitextparser-0.7.5.zip
  • Upload date:
  • Size: 43.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for wikitextparser-0.7.5.zip
Algorithm Hash digest
SHA256 368af19485c389682f479284b70f8883a464e0004fab24f52b8c37a65dc3152c
MD5 4b643426acd6acfd2fc2cc369ed8ee66
BLAKE2b-256 b089edbe65660508264dc1a77b84e7624bd0049741782ddc14c98217ca9ddfda

See more details on using hashes here.

File details

Details for the file wikitextparser-0.7.5.win32.exe.

File metadata

File hashes

Hashes for wikitextparser-0.7.5.win32.exe
Algorithm Hash digest
SHA256 4bbf7c32ef85da884a7bee182b91e41b4ef1ee6854f1b02679e6a3be654e5cf3
MD5 b60cd291aaceaff59318e867a7efba50
BLAKE2b-256 cc29d4312ddc2d1c707f7cb90674520f26dcc5d03e186283397fb2a4482a3f96

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page