Skip to main content

Tools for data

Project description

Toolshed: Less Boiler-Plate

This is a collection of well-tested, simple modules and functions that I use frequently

Files

If you have a “proper” CSV file with quoting and such, use python’s csv module.

If all you have is a file with a header and you want to get a dictionary for each row:

>>> from toolshed import reader, header, nopen
>>> for d in reader('src/toolshed/tests/data/file_data.txt'):
...    print d['a'], d['b'], d['c']
1 2 3
11 12 13
21 22 23

works the same for gzipped and bzipped files and for stdin (via “-“) and for files over http/ftp:

>>> for drow in (d for d in reader('src/toolshed/tests/data/file_data.txt.gz') if int(d['a']) > 10):
...    print drow['a'], drow['b'], drow['c']
11 12 13
21 22 23

if one can specify the header to a file without one using the header= kwarg. If header is “ordered” then an OrderedDictionary will be used so that drow.keys() and d.values() will return the values in the order they appeared in the file.

If header is a callable (a function or class) then, for each row, that callable will be called for each row with a single argument which is the list of columns in the future, it may be called as: callable(*row) instead of callable(row). comments welcome.

sometimes you just want the header:

>>> header('src/toolshed/tests/data/file_data.txt')
['a', 'b', 'c']

the toolshed.nopen can open a file over http, https, ftp, a gzipped file, a bzip file, or a subprocess with the same syntax.

>>> nopen('src/toolshed/tests/data/file_data.txt.gz')
<gzip open file ... >
>>> nopen('|ls')
<open file '<fdopen>'...>

Shedskinner

Shedskin is a program that takes python scripts, infers the types based on example input and generates fast C++ code that compiles to a python extension module. Shedskinner is a decorator that automates this for a single function. Use looks like:

from toolshed import shedskinner

@shedskinner((2, 12), long=True, fast_random=True):
def adder(a, b):
    return a + b

Where here, we have decorated the adder function to make it a compiled, fast version that accepts and returns integers. The (2, 12) are example arguments to the function so that shedskin can infer types. The keyword arguments are sent to the compiler (see: https://gist.github.com/1036972) for more examples.

News

0.2.4

  • if header is a callable, it’s called for each row (instead of returning dict).

0.2.3

  • if reader can accept the generator returned from reader()

0.2.2

  • if an integer is sent to nopen, then nopen(sys.argv[arg]) is returned.

0.2.1

  • fix handling when there’s an exception in the loop that calls a process

0.2.0

  • better error message from Popen when using nopen(”| something”)

0.1.9

  • if the header argument to reader is “ordered” then an ordered dictionary is used.

0.1.8

  • Add is_newer_b(apath, bpaths) to check that all b files are newer than apath.

0.1.3

  • July 26 2011

  • Allow ftp/http(s) paths as arguments to reader

0.1.1

  • use itertools.izip for speed improvement

0.1

Release date: 15-Mar-2010

  • Initial project structure.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toolshed-0.2.4.tar.gz (6.4 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page