Skip to main content

Simple, systematic way to quote, join, and textually wrap Python data

Project description

PyPI Package latest release PyPI Package monthly downloads Supported versions Supported implementations

Simple, systematic way to quote, join, and textually wrap Python data.

Usage

from quoter import *

print single('this')       # 'this'
print double('that')       # "that"
print backticks('ls -l')   # `ls -l`
print braces('curlycue')   # {curlycue}
print braces('curlysue', padding=1)
                           # { curlysue }

And for a taste of some more advanced functionality, quoting HTML content:

print html.p("A para", ".focus")
print html.br()
print html.comment("content ends here")

Yields:

<p class='focus'>A para</p>
<br>
<!-- content ends here -->

This clearly goes beyond “simply wrapping some text with other text.” The output format varies widely and intelligently based on context, including modification with CSS Selector controls, appropriately void/self-closing elements, and specialized markup.

Finally, quoter provides drop-dead simple, yet highly functional, ways to join sequence items together. For example:

mylist = list("ABCD")
print join(mylist)
print join(mylist, sep=" | ", endcaps=braces)
print and_join(mylist)
print and_join(mylist, quoter=double, lastsep=" and ")

Yields:

A, B, C, D
{A | B | C | D}
A, B, C, and D
"A", "B", "C" and "D"

Which shows a range of separators, separation styles (both Oxford and non-Oxford commas), endcaps, and individual item quoting.

Discussion

Programs stringify and quote values all the time. They wrap both native strings and the string representation of other values in all manner of surrounding text. Single quotes. Double quotes. Curly quotes. Backticks. Separating whitespace. Unicode symbols. HTML or XML markup. Et cetera.

There are a lot of ways to do this text formatting and wrapping. For example:

value = 'something'
print '{x}'.replace('x', value)             # {something}
print "'{0}'".format(value)                 # 'value'
print "'" + value + "'"                     # 'value'
print "{0}{1}{2}".format('"', value, '"')   # "value"
print ''.join(['"', value, '"'])            # "value"

But for such a simple, common task as wrapping values in surrounding text, these look pretty ugly, low-level, and dense. Writing them out, it’s easy to mistype a character here or there, or to forget some of the gotchas. Say you’re formatting values, some of which are strings, but others are integers or other primitive types. Instant TypeError! Only strings can be directly concatenated with strings in Python.

The repetitive, ad hoc nature of textual quoting and wrapping is tiresome and error-prone. It’s never more so than when constructing multi-level quoted strings, such as Unix command line arguments, SQL commands, or HTML attributes.

quoter provides a clean, consistent, higher-level alternative. It also provides a mechanism to pre-define your own quoting styles that can then be easily reused.

Doing Better

Unlike native Python concatenation operators, quoter isn’t flustered if you give it non-string data. It knows you want a string output, so it auto-stringifies non-string values:

assert brackets(12) == '[12]'
assert braces(4.4) == '{4.4}'
assert double(None) == '"None"'
assert single(False) == "'False'"

The module pre-defines callable Quoters for a handful of the most common quoting styles:

  • braces {example}

  • brackets [example]

  • angles <example>

  • parens (example)

  • double “example”

  • single ‘example’

  • backticks `example`

  • anglequote «example»

  • curlysingle ‘example’

  • curlydouble “example”

But there are a huge number of ways you might want to wrap or quote text. Even considering just “quotation marks,” there are well over a dozen. There are also numerous bracketing symbols in common use. That’s to say nothing of the constructs seen in markup, programming, and templating languages. So quoter couldn’t possibly provide a default option for every possible quoting style. Instead, it provides a general-purpose mechanism for defining your own:

from quoter import Quoter

bars = Quoter('|')
print bars('x')                    # |x|

plus = Quoter('+','')
print plus('x')                    # +x

para = Quoter('<p>', '</p>')
print para('this is a paragraph')  # <p>this is a paragraph</p>
                                   # NB simple text quoting - see below
                                   # for higher-end HTML handling

variable = Quoter('${', '}')
print variable('x')                # ${x}

Note that bars is specified with just one symbol. If only one is given, the prefix and suffix are considered to be identical. If you really only want a prefix or a suffix, and not both, then instantiate the Quoter with two, one of which is an empty string, as in plus above.

In most cases, it’s cleaner and more efficient to define a style, but there’s nothing preventing you from an on-the-fly usage:

print Quoter('+[ ', ' ]+')('castle')   # +[ castle ]+

Cloning and Setting

Quoter parameters can be changed (set) in real time.:

bars = Quoter('|')
print bars('x')                    # |x|
bars.set(prefix='||', suffix='||')
print bars('x')                    # ||x||
bars.set(padding=1)
print bars('x')                    # || x ||

And Quoter instances you like can be cloned, optionally with several options changed in the clone:

bart = bars.clone(prefix=']', suffix='[')
assert bart('x') == '] x ['

Formatting and Encoding

The Devil, as they say, is in the details. We often don’t just want quote marks wrapped around values. We also want those values set apart from the rest of the text. quoter supports this with padding and margin settings patterned on the CSS box model. In CSS, moving out from content one finds padding, a border, and then a margin. Padding can be thought of as an internal margin, and the prefix and suffix strings like the border. With that in mind:

print braces('this')                      # '{this}'
print braces('this', padding=1)           # '{ this }'
print braces('this', margin=1)            # ' {this} '
print braces('this', padding=1, margin=1) # ' { this } '

If desired, the padding and margin can be given explicitly, as strings. If given as integers, they are interpreted as a number of spaces.

One can also define the encoding used for each call, per instance, or globally. If some of your quote symbols use Unicode characters, yet your output medium doesn’t support them directly, this is an easy fix. E.g.:

Quoter.options.encoding = 'utf-8'
print curlydouble('something something')

Now curlydouble will output UTF-8 bytes. But in general, this is not a great idea; you should work in Unicode strings in Python, encoding or decoding only at the time of input and output, not as each piece of content is constructed.

Shortcuts

One often sees very long function calls and expressions as text parts are being assembled. In order to reduce this problem, quoter defines aliases for single, double, and triple quoting, as well as backticks, and double backticks:

from quoter import qs, qd, qt, qb, qdb

print qs('one'), qd('two'), qt('three'), qb('and'), qdb('four')
# 'one' "two" """three""" `and` ``four``

You can, of course, define your own aliases as well, and/or redefine existing styles. If, for example, you like braces but wish it added a padding space by default, it’s simple to redefine:

braces = Quoter('{', '}', padding=1, name='braces')
print braces('braces plus spaces!')  # '{ braces plus spaces! }'

You could still get the no-padding variation with:

print braces('no space braces', padding=0) # '{no space braces}'

Clean Imports

As an organizational assist, quoters are available as named attributes of a pre-defined quote object. For those who like strict, minimalist imports, this permits from quoter import quote without loss of generality. For example:

from quoter import quote

quote.double('test')    # "test"
quote.braces('test')    # {test}
# ...and so on...

Each of these can also serve like an instance of an enumerated type, specifying for a later time what kind of quoting you’d like. Then, at the time that quoter is needed, it can simply be called. E.g.:

preferred_quoting = quote.brackets

...

print preferred_quoting(data)

Or you could use something very short, like q.

HTML

Quoting does not need to be a simple matter of string concatenation. It can involve sophisticated on-the-fly decisions based on content and context.

For example, there is an extended quoting mode designed for XML and HTML construction. Instead of prefix and suffix strings, XMLQuoter and HTMLQuoter classes build valid HTML out of tag names and “CSS selector” style specifications (similar to those used by jQuery). This is a considerable help in Python, which defines and/or reserves some of the attribute names most used in HTML (e.g. class and id). Using the CSS selector style neatly gets around this annoyance–and is more compact and more consistent with modern web development idioms to boot.:

from quoter import *

print html.p('this is great!', {'class':'emphatic'})
print html.p('this is great!', '.spastic')
print html.p('First para!', '#first')

Yields:

<p class=’emphatic’>this is great!</p> <p class=’spastic’>this is great!</p> <p id=’first’>First para!</p>

Note that the order in which attributes appear is not guaranteed. They’re stored in dict objects, which have different orderings on different versions of Python. This generally isn’t a problem, in that ordering isn’t significant in HTML. It can, however, make string-based testing more annoying.

HTML quoting also understands that some elements are “void” or “self-closing,” meaning they do not need closing tags (and in some cases, not even content). So for example:

>>> print html.br()
<br>

>>> print html.img('.big', src='afile')
<img class='big' src='afile'>

The html object for HTMLQuoter (or corresponding xml for XMLQuoter) is a convenient front-end that can be immediately used to provide simple markup language construction.

You can also access the underlying classes directly, and/or define your own customized quoters. Your own quoters can be called as a function would be. Or, if you give them a name, they can be called through the html front-end, just like the pre-defined tags. For instance:

para_e = HTMLQuoter('p.emphatic', name='para_e')
print para_e('this is great!')
print html.para_e('this is great?', '.question')
print html.img(src='somefile')
print html.br()

Yields:

<p class='emphatic'>this is great!</p>
<p class='question'>this is great?</p>
<img src='somefile'>
<br>

HTMLQuoter quotes attributes by default with single quotes. If you prefer double quotes, you may set them when the element is defined:

div = HTMLQuoter('div', attquote=double)

XML

XMLQuoter with its xml front-end is a similar quoter with markup intelligence. It offers one additional attribute beyond HTMLQuoter: ns for namespaces. Thus:

item = XMLQuoter(tag='item', ns='inv', name='item inv_item')
print item('an item')
print xml.item('another')
print xml.inv_item('yet another')
print xml.thing('something')
print xml.special('else entirely', '#unique')

yields:

<inv:item>an item</inv:item>
<inv:item>another</inv:item>
<inv:item>yet another</inv:item>
<thing>something</thing>
<special id='unique'>else entirely</special>

Note: item was given two names. Multiple aliases are supported. While the item object carries its namespace specification through its different invocations, the calls to non-item quoters nave no persistent namespace. Finally, that the CSS specification language heavily used in HTML is present and available for XML, though its use may be less common.

In general, xml.tagname auto-generates quoters just like html.tagname does on first use. There are also pre-defined utility methods such as html.comment() and xml.comment() for commenting purposes.

Named Styles

Quoting via the functional API or the attribute-accessed front-ends (quote, lambdaq, html, and xml) is probably the easiest way to go. But there’s one more way. If you provide the name of a defined style via the style attribute, that’s the style you get. So while quote('something') gives you single quotes by default ('something'), if you invoke it as quote('something', style='double'), you get double quoting as though you had used quote.double(...), double(...), or qd(...). This even works through named front.ends; quote.braces('something', style='double') still gets you "something". If you don’t want to be confused by such double-bucky forms, don’t use them. The best use-case for named styles is probably when you don’t know how something will be quoted (or what tag it will use, in the HTML or XML case), but that decision is made dynamically. Then style=desired_style makes good sense.

Style names are stored in the class of the quoter. So all Quoter instances share the same named styles, as do HTMLQuoter, XMLQuoter, and LambdaQuoter.

Dynamic Quoters

XMLQuoter and HTMLQuoter show that it’s straightforward to define Quoters that don’t just concatenate text, but that examine it and provide dynamic rewriting on the fly.

LambdaQuoter is a further generalization of this idea. It allows generic formatting to be done by a user-provided function. For example, in finance, one often wants to present numbers with a special formatting:

from quoter import LambdaQuoter

f = lambda v: ('(', abs(v), ')') if v < 0 else ('', v, '')
financial = LambdaQuoter(f)
print financial(-3)            # (3)
print financial(45)            # 45

password = LambdaQuoter(lambda v: ('', 'x' * len(v), ''))
print password('secret!')      # xxxxxxx

wf = lambda v:  ('**', v, '**') if v < 0 else ('', v, '')
warning = LambdaQuoter(wf, name='warning')
print warning(12)              # 12
print warning(-99)             # **-99**

The trick is instantiating LambdaQuoter with a callable (e.g. lambda expression or even a full function) that accepts one value and returns a tuple of three values: the quote prefix, the value (possibly rewritten), and the suffix. The rewriting mechanism can be entirely general, doing truncation, column padding, content obscuring, hashing, or…just anything.

LambdaQuoter named instances are accessed through the lambdaq front-end (because lambda is a reserved word). Given the code above, lambdaq.warning is active, for example.

LambdaQuoter is an edge case, arcing over towards being a general formatting function. That has the virtue of providing a consistent mechanism for tactical output transformation with built-in margin and padding support. It’s also able to encapsulate complex quoting / representation decisions that would otherwise muck up “business logic,” making representation code much more unit-testable. But, one might argue that such full transformations are “a bridge too far” for a quoting module. So use the dynamic component of``quoter``, or not, as you see fit.

Notes

  • Version 1.3 ships the first release of integrated sequence joining. join, word_join, and_join, or_join, joinlines, and items are functional and tested, but still less mature than the rest of the codebase.

  • Version 1.2 institutes full named styles within each quoting class. Tests and docs tweaked.

  • Version 1.1 cleans up HTML quoting, esp. re void / self-closing elements. Added new double-backtick functions. Changed to Apache License 2.0. Updated docs and testing matrix.

  • See CHANGES.rst for more complete change log.

  • quoter provides simple transformations that could be alternatively implemented as a series of small functions. The problem is that such “little functions” tend to be constantly re-implemented, in different ways, and spread through many programs. That need to constantly re-implement such common and straightforward text formatting has led me to re-think how software should format text. quoter is one facet of a project to systematize higher-level formatting operations. See say and show for other parts of the larger effort.

  • quoter is also a test case for options, a module that supports flexible option handling. In fact, it is one of options most extensive test cases, in terms of subclassing and dealing with named styles.

  • In the future, additional quoting styles such as ones for Markdown or RST format styles might appear. It’s not hard to subclass Quoter for new languages.

  • Automated multi-version testing managed with pytest and tox. Packaging linting with pyroma.

    Successfully packaged for, and tested against, all late-model versions of Python: 2.6, 2.7, 3.2, 3.3, 3.4, and 3.5 pre-release (3.5.0b3) as well as PyPy 2.6.0 (based on 2.7.9) and PyPy3 2.4.0 (based on 3.2.5).

  • The author, Jonathan Eunice or @jeunice on Twitter welcomes your comments and suggestions.

Installation

To install or upgrade to the latest version:

pip install -U quoter

To easy_install under a specific Python version (3.3 in this example):

python3.3 -m easy_install --upgrade quoter

(You may need to prefix these with sudo to authorize installation. In environments without super-user privileges, you may want to use pip’s --user option, to install only for a single user, rather than system-wide.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

quoter-1.3.0.zip (40.1 kB view hashes)

Uploaded Source

quoter-1.3.0.tar.gz (28.9 kB view hashes)

Uploaded Source

Built Distribution

quoter-1.3.0-py2.py3-none-any.whl (24.6 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page