Wiki markup translator.
Project description
MediaWiki Markup Translator
This package provides Python framework for translating WikiMedia articles to various formats. The present version supports conversions to plain text, HTML, and Texinfo formats.
A command line converter utility is included.
Classes
class WikiMarkup
A base class for all translator classes. Unless you plan extending wikitrans, you will never have to create objects of this class. Instead, you will be using one of its derived classes.
Constructor arguments common for all derived classes:
- filename = name
The file name is opened and used for input.
- file = fd
An already opened file fd is used for input.
- text = string
Input is taken from string, line by line.
- lang = code
Specifies language version. Default is en. This variable can be referred to as %(lang)s in the keyword arguments below.
- html_base = url
Base URL for cross-references. Default is http://%(lang)s.wikipedia.org/wiki/.
- image_base = url
Base URL for images. Default is http://upload.wikimedia.org/wikipedia/commons/thumb/a/bf
- media_base = url
Base URL for media files. Default is http://www.mediawiki.org/xml/export-0.3
- debug_level = int
Debug verbosity level (0 - no debug info, 100 - excessively copious debug messages). Default is 0.
- strict = bool
Strict parsing mode. Throw exceptions on syntax errors. Default is False.
class TextWikiMarkup
Translates material in Wiki markup language to plain text. Usage:
from WikiTrans.wiki2text import TextWikiMarkup markup = TextWikiMarkup(filename='input.txt') markup.parse() print(str(markup))
Specific constructor arguments:
- width = N
Limit output width to N columns. Default is 78.
- show_urls = bool
Whether or not to show the URLs links refer to. If bool is True (the default), a URL will be displayed in parentheses next to the link text. If False, only the link text will be displayed.
class TextWiktionaryMarkup
Translate material from wiktionary to plain text form. This is supposed to provide a wiktionary-specific form of TextWikiMarkup. Currently, this class differs from TextWikiMarkup only in that the default value for html_base is http://%(lang)s.wikipedia.org/wiki/.
class TexiWikiMarkup
Translate Wiki markup to Texinfo source. Usage:
from WikiTrans.wiki2texi import TexiWikiMarkup markup = TexiWikiMarkup(filename='input.txt') markup.parse() print(str(markup))
Two markup-specific keywords control the sectioning model used.
- sectioning_model = model
Selects the Texinfo sectioning model for the output document. Possible values are:
- numbered
Top of document is marked with @top. Headings (=, ==, ===, etc) produce @chapter, @section, @subsection, etc.
- unnumbered
Unnumbered sectioning: @top, @unnumbered, @unnumberedsec, @unnumberedsubsec.
- appendix
Sectioning suitable for appendix entries: @top, @appendix, @appendixsec, @appendixsubsec, etc.
- heading
Use heading directives to reflect sectioning: @majorheading, @chapheading, @heading, @subheading, etc.
- sectioning_start = n
Shift resulting heading level by n positions. For example, supposing sectioning_model=numbered, == A == will produce @section A on output. If sectioning_start=1 is also given, this directive will produce @subsection A instead.
class HtmlWikiMarkup
Translates Wiki markup to HTML. Usage:
from WikiTrans.wiki2html import HtmlWikiMarkup markup = HtmlWikiMarkup(filename='input.txt') markup.parse() print(str(markup))
Supported keywords are same as for WikiMarkup class.
class HtmlWiktionaryMarkup
Translate material from wiktionary to HTML form. This is supposed to provide a wiktionary-specific form of HtmlWikiMarkup. Currently both classes are equivalent, except that the default value for html_base in HtmlWiktionaryMarkup is http://%(lang)s.wikipedia.org/wiki/.
The wikitrans utility
This command line utility converts the supplied text to selected output format. The usage syntax is:
wikitrans [OPTIONS] ARG
If ARG looks like a URL, the wiki text to be converted will be downloaded from that URL.
Otherwise, if the --base-url=URL option is given, ARG is treated as the name of the page to get from the WikiMedia istallation at URL.
Otherwise, ARG is treated as the name of the file to read wiki material from.
Examples:
wikitrans text.wiki wikitrans --base-url http://en.wiktionary.org door wikitrans https://en.wiktionary.org/wiki/Special:Export/door
Options are:
- --version
Show program’s version number and exit.
- -h, --help
Show a short usage summary and exit.
- -v, --verbose
Verbose operation.
- -I ITYPE, --input-type=ITYPE
Set input document type. ITYPE is one of: default or wiktionary.
- -t OTYPE, --to=OTYPE, --type=OTYPE
Set output document type (html (the default), texi, text, or dump).
- -l LANG, --lang=LANG
Set input document language.
- -o KW=VAL, --option=KW=VAL
Pass the keyword argument KW=VAL to the parser class constructor.
- -d DEBUG, --debug=DEBUG
Set debug level (0..100).
- -D, --dump
Dump parse tree and exit; same as --type=dump.
- -b URL, --base-url=URL
Set base url.
Note: when using --base-url or passing URL as an argument (2nd and 3rd use cases above), if the URL is in ‘wikipedia.org’ or ‘wiktionary.org’ domain, the options --input-type, and --lang are set automatically.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.