Skip to main content

Wiki markup translator.

Project description

MediaWiki Markup Translator

This package provides Python framework for translating WikiMedia articles to various formats. The present version supports conversions to plain text, HTML, and Texinfo formats.

A command line converter utility is included.

Classes

class WikiMarkup

A base class for all translator classes. Unless you plan extending wikitrans, you will never have to create objects of this class. Instead, you will be using one of its derived classes.

Constructor arguments common for all derived classes:

filename = name
The file name is opened and used for input.
file = fd
An already opened file fd is used for input.
text = string
Input is taken from string, line by line.
lang = code
Specifies language version. Default is en. This variable can be referred to as %(lang)s in the keyword arguments below.
html_base = url
Base URL for cross-references. Default is http://%(lang)s.wikipedia.org/wiki/.
image_base = url
Base URL for images. Default is http://upload.wikimedia.org/wikipedia/commons/thumb/a/bf
media_base = url
Base URL for media files. Default is http://www.mediawiki.org/xml/export-0.3

class TextWikiMarkup

Translates material in Wiki markup language to plain text. Usage:

from WikiTrans.wiki2text import TextWikiMarkup

markup = TextWikiMarkup(filename='input.txt')
markup.parse()
print(str(markup))

Specific constructor arguments:

width = N
Limit output width to N columns. Default is 78.
show_urls = bool
Whether or not to show the URLs links refer to. If bool is True (the default), a URL will be displayed in parentheses next to the link text. If False, only the link text will be displayed.

class TextWiktionaryMarkup

Translate material from wiktionary to plain text form. This is supposed to provide a wiktionary-specific form of TextWikiMarkup. Currently, this class differs from TextWikiMarkup only in that the default value for html_base is http://%(lang)s.wikipedia.org/wiki/.

class TexiWikiMarkup

Translate Wiki markup to Texinfo source. Usage:

from WikiTrans.wiki2texi import TexiWikiMarkup

markup = TexiWikiMarkup(filename='input.txt')
markup.parse()
print(str(markup))

Two markup-specific keywords control the sectioning model used.

sectioning_model = model

Selects the Texinfo sectioning model for the output document. Possible values are:

numbered
Top of document is marked with @top. Headings (=, ==, ===, etc) produce @chapter, @section, @subsection, etc.
unnumbered
Unnumbered sectioning: @top, @unnumbered, @unnumberedsec, @unnumberedsubsec.
appendix
Sectioning suitable for appendix entries: @top, @appendix, @appendixsec, @appendixsubsec, etc.
heading
Use heading directives to reflect sectioning: @majorheading, @chapheading, @heading, @subheading, etc.
sectioning_start = n
Shift resulting heading level by n positions. For example, supposing sectioning_model=numbered, == A == will produce @section A on output. If sectioning_start=1 is also given, this directive will produce @subsection A instead.

class HtmlWikiMarkup

Translates Wiki markup to HTML. Usage:

from WikiTrans.wiki2html import HtmlWikiMarkup

markup = HtmlWikiMarkup(filename='input.txt')
markup.parse()
print(str(markup))

Supported keywords are same as for WikiMarkup class.

class HtmlWiktionaryMarkup

Translate material from wiktionary to HTML form. This is supposed to provide a wiktionary-specific form of HtmlWikiMarkup. Currently both classes are equivalent, except that the default value for html_base in HtmlWiktionaryMarkup is http://%(lang)s.wikipedia.org/wiki/.

The wikitrans utility

This command line utility converts the supplied text to selected output format. The usage syntax is:

wikitrans [OPTIONS] ARG

If ARG looks like a URL, the wiki text to be converted will be downloaded from that URL.

Otherwise, if the --base-url=URL option is given, ARG is treated as the name of the page to get from the WikiMedia istallation at URL.

Otherwise, ARG is treated as the name of the file to read wiki material from.

Examples:

wikitrans text.wiki

wikitrans --base-url http://en.wiktionary.org door

wikitrans https://en.wiktionary.org/wiki/Special:Export/door

Options are:

--version
Show program’s version number and exit.
-h, --help
Show a short usage summary and exit.
-v, --verbose
Verbose operation.
-I ITYPE, --input-type=ITYPE
Set input document type. ITYPE is one of: default or wiktionary.
-t OTYPE, --to=OTYPE, --type=OTYPE
Set output document type (html (the default), texi, text, or dump).
-l LANG, --lang=LANG
Set input document language.
-o KW=VAL, --option=KW=VAL
Pass the keyword argument KW=VAL to the parser class constructor.
-d DEBUG, --debug=DEBUG
Set debug level (0..100).
-D, --dump
Dump parse tree and exit; same as --type=dump.
-b URL, --base-url=URL
Set base url.

Note: when using --base-url or passing URL as an argument (2nd and 3rd use cases above), if the URL is in ‘wikipedia.org’ or ‘wiktionary.org’ domain, the options --input-type, and --lang are set automatically.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for wikitrans, version 1.0
Filename, size File type Python version Upload date Hashes
Filename, size wikitrans-1.0.tar.gz (67.9 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page