Datapad is a library of lazy data transformations for sequences; similar to spark and linq
Project description
Datapad: A Fluent API for Exploratory Data Analysis
Datapad is a Python library for processing sequence and stream data using a Fluent style API. Data scientists and researchers use it as a lightweight toolset to efficiently explore datasets and to massage data for modeling tasks.
It can be viewed as a combination of syntatic sugar for the Python itertools module and supercharged tooling for working with Structured Sequence data.
Install
pip install datapad
Exploratory data analysis with Datapad
See what you can do with datapad
in the examples below.
Count all unique items in a sequence:
>>> import datapad as dp
>>> data = ['a', 'b', 'b', 'c', 'c', 'c']
>>> seq = dp.Sequence(data)
>>> seq.count(distinct=True) \
... .collect()
[('a', 1),
('b', 2),
('c', 3)]
Transform individual fields in a sequence:
>>> import datapad as dp
>>> F = datapad.fields
>>> data = [
... {'a': 1, 'b': 2},
... {'a': 4, 'b': 4},
... {'a': 5, 'b': 7}
... ]
>>> seq = dp.Sequence(data)
>>> seq.map(F.apply('a', lambda x: x*2)) \
... .map(F.apply('b', lambda x: x*3)) \
... .collect()
[{'a': 2, 'b': 6},
{'a': 8, 'b': 12},
{'a': 10, 'b': 21}]
Chain together multiple transforms for the elements of a sequence:
>>> import datapad as dp
>>> data = ['a', 'b', 'b', 'c', 'c', 'c']
>>> seq = dp.Sequence(data)
>>> seq.distinct() \
... .map(lambda x: x+'z') \
... .map(lambda x: (x, len(x))) \
... .collect()
[('az', 2),
('bz', 2),
('cz', 2)]
Check out our documentation below to see what else is possible with Datapad:
Development
Run tests from the root of repo using
pip install pytest
sh test.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file datapad-0.7.6-py3-none-any.whl
.
File metadata
- Download URL: datapad-0.7.6-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d017db085cba8d52b83fbb9ba4ddab796bf952f21fdc126a398bcf50a9e7107c |
|
MD5 | e4fb65a4aa540e94bb37247776d3bba7 |
|
BLAKE2b-256 | 247f7ad4a8d1a95c4b716a2bc2e37d85965894b3555cc017d4c01d66d6d4236b |