Skip to main content

A library of convenient utility functions and pure Python data structures.

Project description

ConvUtils provides a small library of convenience functions for dealing with a variety of tasks, such as creating CSV readers and writers, and convenient data structures, such as a two-way dictionary.

This package provides two modules: utils and structs. Typically, the user will want to import one or the other, e.g.

from convutils import utils

utils

utils provides the following classes:

  • SimpleTsvDialect is similar to the csv.excel_tab dialect, but uses the newline character ('\n') as the line separator, and does no special quoting, giving a more Unix-friendly TSV (tab-separated values) format. (New in v2.0: formerly ExcelTabNewlineDialect.)

utils also provides the following functions:

  • make_csv_reader creates a csv.DictReader or csv.Reader instance with the convenience of the user not having to explicitly specify the CSV dialect.

  • make_simple_tsv_reader is similar to make_csv_reader, but always uses SimpleTsvDialect. (New in v2.0.)

  • make_csv_dict_writer creates a csv.DictWriter instance with the convenience of not having to manually enter the header row yourself; uses csv.excel as the dialect, by default.

  • make_simple_tsv_dict_writer is similar to make_csv_dict_writer, but uses the SimpleTsvDialect instead. (New in v2.0.)

  • append_to_file_base_name will return a modified file name given an original one and a string between the base name and the extension (e.g., append_to_file_base_name('myfile.txt', '-2') returns 'myfile-2.txt').

  • count_lines counts the number of lines in a file.

  • split_file_by_parts takes one large file and splits it into new files, the maximum number of which is given by the user.

  • split_file_by_num_lines takes one large file and splits it into new file, the maximum number of lines in each being defined by the user.

  • column_args_to_indices takes a string representing desired columns (e.g., '1-4,6,8') and converts it into actual indices and slices of an indexable Python sequence.

  • cumsum produces the cumulative sum of any iterable whose elements support the add operator. (New in v1.1.)

structs

structs provides two convenient data structures, both specialized subclasses of Python’s dict.

  • SortedTupleKeysDict is a dictionary which expects 2-tuples as keys, and will always sort the tuples, either when setting or retrieving values.

  • TwoWaySetDict is a dictionary that assumes the values are sets, and will store a reverse lookup dictionary to tell you, for each set in the values that some item belongs to, the keys with which item is associated.

structs also provides two functions for sampling Python dictionaries whose values are lists:

  • sample_list_dict is like random.sample but for dictionaries whose values are lists or other enumerable, iterable container types. (New in v1.1; relocated to structs in v2.0.)

  • sample_list_dict_low_mem is similar to sample_list_dict but has a lower memory consumption for larger dictionaries. (New in v1.1; relocated to structs in v2.0.)

Installation

Installation is easy with pip; simply run

pip install ConvUtils

Availability

CHANGELOG

v2.0

  • Refactored code for Python 3 compatibility via 2to3 during installation.

  • Dropped compatibility for Python 2 versions less than 2.7.

  • Added dependency on mock for unit tests. This dependency is satisfied by the standard library for Python 3.3 and newer.

  • Renamed convutils.convutils to convutils.utils; renamed convutils.convstructs to convutils.structs.

  • Added unit tests for convutils.utils.

  • Renamed ExcelTabNewlineDialect to SimpleTsvDialect, and changed its quoting style to no quoting.

  • Refactored make_csv_reader and make_simple_tsv_dict_writer to use the csv.excel dialect by default, to be more in line with the standard library. Added new functions make_simple_tsv_reader and make_simple_tsv_dict_writer for the previous functionality.

  • Renamed the headers parameter to header for make_csv_reader and make_simple_tsv_reader.

  • Changed the signatures of split_file_by_num_lines and split_file_by_parts. The functions now accept a file handle instead of a file name. The parameter has_header has been renamed header. Added two new parameters, pad_file_names and num_lines_total. If pad_file_names is True, the numerical portion of the output file names will be zero-padded. If num_lines_total is provided in addition to pad_file_names, split_file_by_num_lines and split_file_by_parts will skip counting the number of lines in the file, itself, which can save time.

  • SortedTupleKeysDict and TwoWaySetDict now subclass collections.MutableMapping instead of dict directly due to the suggestions on best-practices from Stack Overflow: http://stackoverflow.com/questions/3387691/python-how-to-perfectly-override-a-dict

  • Relocated sample_list_dict and sample_list_dict_low_mem to convutils.structs.

  • Added Sphinx-based documentation.

v1.1 2012-03-23

  • Changed docstrings to use Sphinx info field lists.

  • Added cumsum

  • Added sample_list_dict and sample_list_dict_low_mem.

v1.0.1 2011-01-18

  • Added imports of modules into package __init__.py.

v1.0 2011-01-10

  • Initial release.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

ConvUtils-2.0.zip (19.7 kB view hashes)

Uploaded Source

ConvUtils-2.0.tar.gz (14.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page