Skip to main content

Tools for data

Project description

This is a collection of well-tested, simple modules and functions that I use frequently.

Files

If you have a “proper” CSV file with quoting and such, use python’s csv module.

If all you have is a file with a header and you want to get a dictionary for each row:

>>> from toolshed import reader, header, nopen
>>> for d in reader('toolshed/tests/data/file_data.txt'):
...    print d['a'], d['b'], d['c']
1 2 3
11 12 13
21 22 23

Or as a namedtuple:

>>> from collections import namedtuple
>>> for d in reader('toolshed/tests/data/file_data.txt', header=namedtuple):
...    print d.a, d.b, d.c
1 2 3
11 12 13
21 22 23

works the same for gzipped, bzipped, and .xls files and for stdin (via “-“) and for files over http/ftp:

>>> for drow in (d for d in reader('toolshed/tests/data/file_data.txt.gz') if int(d['a']) > 10):
...    print drow['a'], drow['b'], drow['c']
11 12 13
21 22 23

if one can specify the header to a file without one using the header= kwarg. If header is “ordered” then an OrderedDictionary will be used so that drow.keys() and d.values() will return the values in the order they appeared in the file.

If header is a callable (a function or class) then, for each row, that callable will be called for each row with a single argument which is the list of columns in the future, it may be called as: callable(*row) instead of callable(row). comments welcome.

the toolshed.nopen can open a file over http, https, ftp, a gzipped file, a bzip file, or a subprocess with the same syntax:

>>> nopen('toolshed/tests/data/file_data.txt.gz') # doctest: +ELLIPSIS
<gzip open file ...>
>>> nopen('|ls') # doctest: +ELLIPSIS
<generator object process_iter at ...>

#you may need to send stdin to a proc:

# NOTE mode is None
>>> proc = nopen("|awk '(NR % 2 == 1)'", mode=None)

# write some stuff to STDIN
>>> proc.stdin.write("number\n")
>>> for i in range(5):
...    proc.stdin.write("%i\n" % i)

# IMPORTANT! close stdin
>>> proc.stdin.close()

# the read stdout
>>> for d in reader(proc.stdout, header=True):
...    print d
{'number': '1'}
{'number': '3'}

#In addition, you can skip the first lines of a file with a function like::

skipper = lambda toks: toks[0].startswith('#')
for d in reader('file-with-extra-header.txt', skip_while=skipper):
    do_stuff(d)

Pools

ctrl+c on a long-running multi-processing pool is often non-responsive. if we use toolshed.pool(), that is fixed (using signal).

this module also provides pmap, which wraps multiprocessing.Pool.map() to expand args, so we can do:

>>> def fn(a, b):  return a + b

>>> from toolshed import pmap
>>> list(pmap(fn, [(1, 1), (2, 3)]))
[2, 5]

and the fn will be mapped in parallel and we didn’t need a wrapper function for fn like:

def wrapper(args):
    return fn(*args)

as we would normally.

Note that this is like:

>>> from itertools import starmap
>>> list(starmap(fn, [(1, 1), (2, 3)]))
[2, 5]

But Pool.starmap is not available until python 3.3

This can cause problems in cases where your ‘fn’ expects args, instead of the exploded arguments. In the future, it may introspect fn, but that is not implemented for now.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toolshed-0.4.8.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

toolshed-0.4.8-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file toolshed-0.4.8.tar.gz.

File metadata

  • Download URL: toolshed-0.4.8.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.23

File hashes

Hashes for toolshed-0.4.8.tar.gz
Algorithm Hash digest
SHA256 a47c3ab8cec47c75a7a6d142c73663d928a9e57f6d7739108607508c8fe06e3b
MD5 d034f948663bcc233074e06ba17a0ebf
BLAKE2b-256 dea6e4d602259136557b8873981e11f3445880f236e963e769f892ae314fb1d9

See more details on using hashes here.

File details

Details for the file toolshed-0.4.8-py3-none-any.whl.

File metadata

  • Download URL: toolshed-0.4.8-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.23

File hashes

Hashes for toolshed-0.4.8-py3-none-any.whl
Algorithm Hash digest
SHA256 d27f293cdf4c5c6b3045268526e9c3e0eab49b53a77adfa0ae1ee3c2eb0ee1d3
MD5 3aa0d16261ed724310c195f442404ea8
BLAKE2b-256 e08f637fc7ba68b277ee0c5cbd5368f14a752b2d2df3131c6cd439d309887f24

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page