Cleaning your messy data.
Project Description
Cleaning your messy data.
Contents
Getting started
Consider cleaning up some messy data. Here is a deep nested dictionary containing lots of unnecessary nesting and tuple.
some_messy_data = { "body": { "article": { "articlesbody": { "articlesmeta": { "articles_meta_3": "Monty Python", } } }, }, "published": { "datetime": ("2014-11-05", "23:00:00"), } }
Values you want are 'Monty Python' and '2014-11-05', should be named 'title' and 'published_date'
Now let the hack begin with the dripper.
- Defile declaration dictionary
- Create dripper object by dripper.dripper_factory
- Drip essential data
# Define declaration = { "title": ("body", "article", "articlesbody", "articlesmeta", "articles_meta_3"), "published_date": ("published", "datetime", 0) } # Create import dripper d = dripper.dripper_factory(declaration) # And drip dripped = d(some_messy_data) assert dripped == { "title": "Monty Python", "published_date": "2014-11-05", }
Installation
Just use pip to install
pip install dripper
Requirements
dripper won’t require any kind of outer libraries. Supporting Python versions are:
- Python 2.7
- Python 3.3
- Python 3.4
- Python 3.5
Basics
Above example is not all features of dripper. It is created to handle various data to clean up.
As value
from dripper import dripper_factory declaration = { "title": ("meta", "meta1") }) d = dripper_factory(declaration) d({"meta": {"meta1": "Monty Python"}}) == {"title": "Monty Python"}
Also you can specify string or integer directly. It is as same as one-element tuple.
from dripper import dripper_factory declaration = { "title": "meta" }) d = dripper_factory(declaration) d({"meta": "Monty Python"}) == {"title": "Monty Python"}
As dict
dripper can define nested dictionary. Just pass nested dictionary to dripper_factory.
from dripper import dripper_factory declaration = { "article": { "title": ["meta", "meta1"], } }) d = dripper_factory(declaration) d({ "meta": { "meta1": "Monty Python", }, }) == { "article": { "title": "Monty Python", } }
You can apply '__source_root__' to set root path for dripping.
declaration = { "article": { "__source_root__": ("body", "meta"), ... "title": "meta1", "author": ("meta2", "meta22"), } }) d = dripper_factory(declaration) d({ "body": { "meta": { "meta1": "Monty Python", "meta2": {"meta22": "John Due"} } } }) == { "article": { "title": "Monty Python", "author": "John Due", } }
Technically, outermost dictionary of declaration is as same as inner dictionaries. So you can specify '__source_root__' the dictionary.
As list
dripper can define list of dictionaries. You need to apply '__type__': 'list'.
from dripper import dripper_factory declaration = { "articles": { "__type__": "list", "__source_root__": "articles", ... "title": "meta1", "author": ["meta2", "meta22"], } }) d = dripper_factory(declaration) d({ "articles": [ {"meta1": "Monty Python", "meta2": {"meta22": "John Doe"}}, {"meta1": "Flying Circus", "meta2": {"meta22": "Jane Doe"}}, ] }) == { "articles": [ {"title": "Monty Python", "author": "John Doe"}, {"title": "Flying Circus", "author": "Jane Doe"}, ] }
Advanced
Converting
Use dripper.ValueDripper to pass converter function.
import dripper declaration = { "title": dripper.ValueDripper(["title"], converter=lambda s: s.lower()) } d = dripper.dripper_factory(declaration) d({"title": "TITLE"}) == {"title": "title"}
Technically, each ends (list) will be replaced by instance of dripper.ValueDripper.
default value
Specify default keyword argument to change default value. None will be applied as default.
import dripper declaration = { "title": dripper.ValueDripper(["title"], default="default") } d = dripper.dripper_factory(declaration) d({}) == {"title": "default"}
Technically, each ends (list) will be replaced by instance of dripper.ValueDripper.
Combining
By combining dripper.ValueDripper, result value of that key will be combined.
import dripper declaration = { "fullname": (dripper.ValueDripper(["firstname"]) + dripper.ValueDripper(["lastname"])) } d = dripper.dripper_factory(declaration) d({"firstname": "Hrioki", "lastname": "Kiyohara"}) == {"fullname": "HriokiKiyohara"}
CHANGES
1.2
- Avoid deepcopy to improve speed
- https://github.com/hirokiky/dripper/pull/6
- Thanks @afiram
1.1
- None is default value of ValueDripper
- Before this change ValueDripper without default keyword argument will raise DrippingError
- In order to this behavior DictDripper will return empty dict when inner value dripper could not dig out values
- Thanks for @bungoume to suggest this behavior
1.0
- Officially supported Python 3.5
0.3.1
- ValueDripper now accepts default argument.
0.3
- Fixed to accept string or integer directly as source_root.
0.2
- Improved error handling.
- Added MixDripper.
0.1
- Initial version
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size & hash SHA256 hash help | File type | Python version | Upload date |
---|---|---|---|
dripper-1.2-py3-none-any.whl (7.1 kB) Copy SHA256 hash SHA256 | Wheel | 3.5 | Sep 28, 2017 |
dripper-1.2.tar.gz (5.5 kB) Copy SHA256 hash SHA256 | Source | None | Sep 28, 2017 |