This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

Kids data manipulation helpers.

Project Description

kids.data is a Python library providing helpers to manage data.

It’s part of ‘Kids’ (for Keep It Dead Simple) library.

Maturity

This code is in alpha stage. It wasn’t tested on Windows. API may change. This is more a draft for an ongoing reflection.

And I should add this is probably not ready to show. Although, a lot of these function are used everyday in my projects and I got sick rewritting them for every project.

Features

using kids.data:

  • You’ll have a matching library to fuzzy match elements
  • a formatter concept to help you format any type of data to another type
  • a way to display tables of records on command line
  • some everyday missing function for manipulating sets of elements

Installation

You don’t need to download the GIT version of the code as kids.data is available on the PyPI. So you should be able to run:

pip install kids.data

If you have downloaded the GIT sources, then you could add install the current version via traditional:

python setup.py install

And if you don’t have the GIT sources but would like to get the latest master or branch from github, you could also:

pip install git+https://github.com/0k/kids.data

Or even select a specific revision (branch/tag/commit):

pip install git+https://github.com/0k/kids.data@master

Usage

mdict

mdict are nested dicts access in one go thanks to interpreting the key, check this:

>>> from pprint import pprint as pp
>>> from kids.data.mdict import mdict

>>> d = mdict({'a': {'b': {'y': 0}}, 'x': 1})
>>> d['a.b.y']
0
>>> d.get('a.b.z', 3)
3
>>> d['a.b']
m{'y': 0}

You can configure your mdict to use ‘/’ instead, and if you want more you could build your own key tokenizer to break your string into token:

>>> from kids.data.mdict import CharTokenizer

>>> d = mdict({'a': {'b': {'y': 0}}, 'x': 1}, CharTokenizer('/'))
>>> d['a/b/y']
0

Of course setting item works the same:

>>> d['a/b/z'] = 2
>>> d
m{'a': {'b': {'y': 0, 'z': 2}}, 'x': 1}

And deleting items:

>>> del d['a/b']
>>> d
m{'a': {}, 'x': 1}

Please note that the tokenizer is quite stable even with backslashed or empty keys:

>>> d[r'a/b\/c//d'] = 9
>>> d
m{'a': {'b/c': {'': {'d': 9}}}, 'x': 1}

And flattening back the key/values is done through flat property:

>>> pp(d.flat)
{'a/b\\/c//d': 9, 'x': 1}

If you just want to use it once on a nested dict, all the function are ready for use:

>>> from kids.data.mdict import mset, mget, mdel

>>> dct = {'a': {'b': {'y': 0}}, 'x': 1}
>>> mget(dct, 'a.b.y')
0
>>> mset(dct, 'a.b.z', 2)
>>> pp(dct)
{'a': {'b': {'y': 0, 'z': 2}}, 'x': 1}

>>> mdel(dct, 'a.b')
>>> pp(dct)
{'a': {}, 'x': 1}

graph

graph provide a bunch of function to work with graph. In a agnostic way, this means you can store your graph in whatever the form you want. All you need to do it to provide a function to get the related nodes from their related nodes.

Example with the cycle_exists function:

>>> from kids.data.graph import cycle_exists

>>> graph = {1: [2, 3], 2: [1]}
>>> get_children = lambda n: graph.get(n, [])

>>> cycle_exists(1, get_children)
True

>>> cycle_exists(3, get_children)
False

As node 3 is a leaf there are no cycle starting from him.

You could get the leafage of a set of elements (a leaf is a final node without children). The leafage is all the leaf that can be reached from given elements:

>>> from kids.data.graph import leafage

>>> list(leafage([1, 4], get_children))
[3, 4]

The nice one is reorder, which will try to do the minimum change to a given list, but will swap element to garanty no dependency issues, this means that the children will appear before the parents. This is very handy when loading modules that depends to other modules:

>>> from kids.data.graph import reorder

>>> graph = {2: [1], 3: [2]}
>>> reorder([1, 3, 2], get_children)
[1, 2, 3]

dct

Merging dicts is something that should be in base python and is missing a lot of people (see this stackoverflow question about merging dict non-inplace).

You can use merge to merge several dicts into one:

>>> from pprint import pprint
>>> from kids.data.dct import merge

>>> pp(merge({'a': 1}, {'a': 2, 'b': 1}, {'c': 3}))
{'a': 2, 'b': 1, 'c': 3}

Contributing

Any suggestion or issue is welcome. Push request are very welcome, please check out the guidelines.

Push Request Guidelines

You can send any code. I’ll look at it and will integrate it myself in the code base and leave you as the author. This process can take time and it’ll take less time if you follow the following guidelines:

  • check your code with PEP8 or pylint. Try to stick to 80 columns wide.
  • separate your commits per smallest concern.
  • each commit should pass the tests (to allow easy bisect)
  • each functionality/bugfix commit should contain the code, tests, and doc.
  • prior minor commit with typographic or code cosmetic changes are very welcome. These should be tagged in their commit summary with !minor.
  • the commit message should follow gitchangelog rules (check the git log to get examples)
  • if the commit fixes an issue or finished the implementation of a feature, please mention it in the summary.

If you have some questions about guidelines which is not answered here, please check the current git log, you might find previous commit that would show you how to deal with your issue.

License

Copyright (c) 2015 Valentin Lab.

Licensed under the BSD License.

Changelog

0.0.5 (2015-03-04)

New

  • Added MultiDictReader class to allow reading from several dicts. [Valentin Lab]

    This provides an interesting lazy evaluated way to merge dicts. Additionaly multi-depth dicts are conveniently merged.

  • Added AttrDictAbstract to help creating attr-dict patterns from a small method subset. [Valentin Lab]

  • Introduce DictLikeAbstract to write quickly full dict like API from a small subset. [Valentin Lab]

  • Added untokenize notion, it’ll undo tokenize job. [Valentin Lab]

  • .items() is not flattening anymore, use .flat for that. [Valentin Lab]

    Replaced the flattening of the items done by .items() to remove it towards the .flat property.

    In the process, the .keys() was added.

  • Dict is now passed by reference and mdict is offering a extended API to it. [Valentin Lab]

  • [mdict] cleaned code a give a more coherent API. [Valentin Lab]

Fix

  • When iterating through keys of mdict, those weren’t appropriately quoted. [Valentin Lab]

0.0.4 (2015-02-06)

New

  • [dct] added deep_copy shortcut. [Valentin Lab]

    This is to get all usefull dict related stuff without having to know all package required. And to follow pep8 convention on variable/function names (aka: deep_copy instead of deepcopy).

  • [dct] added merge to merge dicts. [Valentin Lab]

0.0.3 (2015-02-05)

New

  • [graph] added graph functions. [Valentin Lab]
  • [mdict] added mdict pattern. [Valentin Lab]
  • [lib] half_split_on_predicate added. [Valentin Lab]
  • [lib] added default arguments to first. [Valentin Lab]

0.0.2 (2015-01-20)

New

  • Python3 support and added tests for better coverage. [Valentin Lab]
  • [match] added matching and fuzzy matching library. [Valentin Lab]
  • [fmt] remove all trailing whitespace on record line display. [Valentin Lab]

0.0.1 (2014-05-23)

  • First import. [Valentin Lab]
Release History

Release History

This version
History Node

0.0.5

History Node

0.0.4

History Node

0.0.3

History Node

0.0.2

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
kids.data-0.0.5.tar.gz (26.9 kB) Copy SHA256 Checksum SHA256 Source Mar 4, 2015

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting