Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

Easily get clean data, direct from text or Python source

Project Description

One often needs to state data in program source. Python, however, needs its lines indented just so. Multi-line strings therefore often have extra spaces and newline characters you didn’t really want. Many developers “fix” this by using Python list literals, but that’s tedious, verbose, and often less legible.

The textdata package makes it easy to have clean, nicely-whitespaced data specified in your program, but to get the data without extra whitespace cluttering things up. It’s permissive of the layouts needed to make Python code look and work right, without reflecting those requirements in the resulting data. For example:

data = lines("""
    There was an old woman who lived in a shoe.
    She had so many children, she didn't know what to do;
    She gave them some broth without any bread;
    Then whipped them all soundly and put them to bed.
""")

will result in:

['There was an old woman who lived in a shoe.',
 "She had so many children, she didn't know what to do;",
 'She gave them some broth without any bread;',
 'Then whipped them all soundly and put them to bed.']

Note that the “extra” newlines and leading spaces have been taken care of and discarded. Or do you want that as just one string? Okay:

data = text("""
    There as an old woman...
                                     ...put them to bed.
""")

Does the same stripping of pointless whitespace at the beginning and end, returning the data as a clean, convenient string. Or if you don’t want most of the line endings, try textline on the same input to get a single no-breaks line.

Other times, the data you need is almost, but not quite, a series of words. A list of names, a list of color names–values that are mostly single words, but sometimes have an embedded spaces. textdata has you covered:

>>> words(' Billy Bobby "Mr. Smith" "Mrs. Jones"  ')
['Billy', 'Bobby', 'Mr. Smith', 'Mrs. Jones']

Embedded quotes (either single or double) can be used to construct “words” (or phrases) containing whitespace (including tabs and newlines).

words, like the other textdata facilities, allows you to comment individual lines that would otherwise muck up string literals:

exclude = words("""
    __pycache__ *.pyc *.pyo     # compilation artifacts
    .hg* .git*                  # repository artifacts
    .coverage                   # code tool artifacts
    .DS_Store                   # platform artifacts
""")

Yields:

['__pycache__', '*.pyc', '*.pyo', '.hg*', '.git*',
 '.coverage', '.DS_Store']

Finally, you might wan to collect “paragraphs”–contiguous runs of text lines that are delineated by blank lines. Markdown and RST document formats, for example, use this convention. textdata makes it easy:

>>> rhyme = """
    Hey diddle diddle,

    The cat and the fiddle,
    The cow jumped over the moon.
    The little dog laughed,
    To see such sport,

    And the dish ran away with the spoon.
"""
>>> paras(rhyme)
[['Hey diddle diddle,'],
 ['The cat and the fiddle,',
  'The cow jumped over the moon.',
  'The little dog laughed,',
  'To see such sport,'],
 ['And the dish ran away with the spoon.']]

Or if you’d like paras, but each paragraph in a single string:

>>> paras(rhyme, join="\n")
['Hey diddle diddle,',
 'The cat and the fiddle,\nThe cow jumped over the moon.\nThe little dog laughed,\nTo see such sport,',
 'And the dish ran away with the spoon.']

Or maybe you want a dict:

>>> attrs("a=1 b=2 c='something more'")
{'a': 1, 'b': 2, 'c': 'something more'}

textdata is all about conveniently grabbing the data you want from text files and program source, and doing it in a highly functional, well-tested way. Take it for a spin today!

See the full documentation at Read the Docs.

Release History

Release History

This version
History Node

1.7.3

History Node

1.7.2

History Node

1.7.1

History Node

1.7.0

History Node

1.6.2

History Node

1.6.1

History Node

1.6.0

History Node

1.5.1

History Node

1.5.0

History Node

1.4.5

History Node

1.4.4

History Node

1.4.3

History Node

1.4.2

History Node

1.4.1

History Node

1.4.0

History Node

1.3.0

History Node

1.2.3

History Node

1.2.2

History Node

1.2.1

History Node

1.2.0

History Node

1.1.5

History Node

1.1.3

History Node

1.1.2

History Node

1.1.1

History Node

1.1.0

History Node

1.0.8

History Node

1.0.7

History Node

1.0.6

History Node

1.0.5

History Node

1.0.4

History Node

1.0.3

History Node

1.0.2

History Node

1.0.1

History Node

1.0

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
textdata-1.7.3-py2.py3-none-any.whl (10.5 kB) Copy SHA256 Checksum SHA256 3.6 Wheel Oct 13, 2017
textdata-1.7.3.zip (19.5 kB) Copy SHA256 Checksum SHA256 Source Oct 13, 2017

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting