Skip to main content

Work with unicode/non-unicode data from files or strings uniformly.

Project description

data is a small Python module that allows you to treat input in a singular way and leave it up to the caller to supply a byte-string, a unicode object, a file-like or a filename.

>>> open('helloworld.txt', 'w').write('hello, world from a file')

>>> from data import Data as I
>>> a = I(u'hello, world')
>>> b = I(file='helloworld.txt')
>>> c = I(open('helloworld.txt'))

>>> print unicode(a)
hello, world
>>> print unicode(b)
hello, world from a file
>>> print unicode(c)
hello, world from a file

This can be made even more convenient using the data decorator:

>>> from data.decorators import data

>>> @data('buf')
... def parse_buffer(buf, magic_mode=False):
...   return 'buf passed in as ' + repr(buf)

>>> parse_buffer('hello')
"buf passed in as Data(data='hello', encoding='utf8')"

>>> rv = parse_buffer(open('helloworld.txt'))
>>> assert 'file=' in rv

Fitting in

All instances support methods like read or __str__ that make it easy to fit it into existing APIs:

>>> d = I('some data')
u' dat'
>>> e = I(u'more data')
>>> str(e)
'more data'

Note how read returns unicode. Additionally, readb is available:

>>> f = I(u'I am \xdcnicode.')
>>> f.readb()
'I am \xc3\x9cnicode.'

Every data object has an encoding attribute which is used for converting from and to unicode.

>>> g = I(u'I am \xdcnicode.', encoding='latin1')
>>> g.readb()
'I am \xdcnicode.'

Iteration and line reading are also supported:

>>> h = I('I am\nof many\nlines')
>>> h.readline()
u'I am\n'
>>> h.readlines()
[u'of many\n', u'lines']

>>> i = I('line one\nline two\n')
>>> list(iter(i))
[u'line one\n', u'line two\n']



Some useful convenience methods are available:

>>> j = I('example')
>>> j.save_to('example.txt')

The save_to method will use the most efficient way possible to save the data to a file (copyfileobj or write()). It can also be passed a file-like object:

>>> k = I('example2')
>>> with open('example2.txt', 'wb') as out:
...     k.save_to(out)


If you need the output inside a secure temporary file, temp_saved is available:

>>> l = I('goes into tmp')
>>> with l.temp_saved() as tmp:
...     print'/tmp/tmp')
...     print
goes into tmp

temp_saved functions almost identically to tempfile.NamedTemporaryFile, with one difference: There is no delete argument. The file is removed only when the context manager exits.

Where it is useful

data can be used on both sides of an API, either while passing values in:

>>> import json
>>> from data import Data as I

>>> m = I('{"this": "json"}')
>>> json.load(m)
{u'this': u'json'}

or when getting values passed (see the data decorator example above). If necessary, you can also support APIs that allow users to pass in filenames:

>>> class Parser(object):
...   @data('input')
...   def parse(self, input, parser_opt=False):
...     return input
...   def parse_file(self, input_file, *args, **kwargs):
...     return self.parse(I(file=input_file), *args, **kwargs)

>>> p = Parser()
>>> p.parse_file('/dev/urandom')
Data(file='/dev/urandom', encoding='utf8')

See the documentation at for an API reference.

Python 2 and 3

data works the same on Python 2 and 3 thanks to six, a few compatibility functions and a testsuite.

Python 3 is supported from 3.3 onwards, Python 2 from 2.6.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data-0.4.tar.gz (7.0 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page