Skip to main content

Wiki markup translator.

Project description

MediaWiki Markup Translator

This package provides Python framework for translating WikiMedia articles to various formats. The present version supports conversions to plain text, HTML, and Texinfo formats.

A command line converter utility is included.

Classes

class WikiMarkup

A base class for all translator classes. Unless you plan extending wikitrans, you will never have to create objects of this class. Instead, you will be using one of its derived classes.

Constructor arguments common for all derived classes:

filename = name

The file name is opened and used for input.

file = fd

An already opened file fd is used for input.

text = string

Input is taken from string, line by line.

lang = code

Specifies language version. Default is en. This variable can be referred to as %(lang)s in the keyword arguments below.

html_base = url

Base URL for cross-references. Default is http://%(lang)s.wikipedia.org/wiki/.

image_base = url

Base URL for images. Default is http://upload.wikimedia.org/wikipedia/commons/thumb/a/bf

media_base = url

Base URL for media files. Default is http://www.mediawiki.org/xml/export-0.3

debug_level = int

Debug verbosity level (0 - no debug info, 100 - excessively copious debug messages). Default is 0.

strict = bool

Strict parsing mode. Throw exceptions on syntax errors. Default is False.

class TextWikiMarkup

Translates material in Wiki markup language to plain text. Usage:

from WikiTrans.wiki2text import TextWikiMarkup

markup = TextWikiMarkup(filename='input.txt')
markup.parse()
print(str(markup))

Specific constructor arguments:

width = N

Limit output width to N columns. Default is 78.

show_urls = bool

Whether or not to show the URLs links refer to. If bool is True (the default), a URL will be displayed in parentheses next to the link text. If False, only the link text will be displayed.

class TextWiktionaryMarkup

Translate material from wiktionary to plain text form. This is supposed to provide a wiktionary-specific form of TextWikiMarkup. Currently, this class differs from TextWikiMarkup only in that the default value for html_base is http://%(lang)s.wikipedia.org/wiki/.

class TexiWikiMarkup

Translate Wiki markup to Texinfo source. Usage:

from WikiTrans.wiki2texi import TexiWikiMarkup

markup = TexiWikiMarkup(filename='input.txt')
markup.parse()
print(str(markup))

Two markup-specific keywords control the sectioning model used.

sectioning_model = model

Selects the Texinfo sectioning model for the output document. Possible values are:

numbered

Top of document is marked with @top. Headings (=, ==, ===, etc) produce @chapter, @section, @subsection, etc.

unnumbered

Unnumbered sectioning: @top, @unnumbered, @unnumberedsec, @unnumberedsubsec.

appendix

Sectioning suitable for appendix entries: @top, @appendix, @appendixsec, @appendixsubsec, etc.

heading

Use heading directives to reflect sectioning: @majorheading, @chapheading, @heading, @subheading, etc.

sectioning_start = n

Shift resulting heading level by n positions. For example, supposing sectioning_model=numbered, == A == will produce @section A on output. If sectioning_start=1 is also given, this directive will produce @subsection A instead.

class HtmlWikiMarkup

Translates Wiki markup to HTML. Usage:

from WikiTrans.wiki2html import HtmlWikiMarkup

markup = HtmlWikiMarkup(filename='input.txt')
markup.parse()
print(str(markup))

Supported keywords are same as for WikiMarkup class.

class HtmlWiktionaryMarkup

Translate material from wiktionary to HTML form. This is supposed to provide a wiktionary-specific form of HtmlWikiMarkup. Currently both classes are equivalent, except that the default value for html_base in HtmlWiktionaryMarkup is http://%(lang)s.wikipedia.org/wiki/.

The wikitrans utility

This command line utility converts the supplied text to selected output format. The usage syntax is:

wikitrans [OPTIONS] ARG

If ARG looks like a URL, the wiki text to be converted will be downloaded from that URL.

Otherwise, if the --base-url=URL option is given, ARG is treated as the name of the page to get from the WikiMedia istallation at URL.

Otherwise, ARG is treated as the name of the file to read wiki material from.

Examples:

wikitrans text.wiki

wikitrans --base-url http://en.wiktionary.org door

wikitrans https://en.wiktionary.org/wiki/Special:Export/door

Options are:

--version

Show program’s version number and exit.

-h, --help

Show a short usage summary and exit.

-v, --verbose

Verbose operation.

-I ITYPE, --input-type=ITYPE

Set input document type. ITYPE is one of: default or wiktionary.

-t OTYPE, --to=OTYPE, --type=OTYPE

Set output document type (html (the default), texi, text, or dump).

-l LANG, --lang=LANG

Set input document language.

-o KW=VAL, --option=KW=VAL

Pass the keyword argument KW=VAL to the parser class constructor.

-d DEBUG, --debug=DEBUG

Set debug level (0..100).

-D, --dump

Dump parse tree and exit; same as --type=dump.

-b URL, --base-url=URL

Set base url.

Note: when using --base-url or passing URL as an argument (2nd and 3rd use cases above), if the URL is in ‘wikipedia.org’ or ‘wiktionary.org’ domain, the options --input-type, and --lang are set automatically.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikitrans-1.4.tar.gz (67.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page