Skip to main content

convtools allows to define and reuse conversions for processing collections and csv tables, complex aggregations and joins.

Project description

convtools is a python library to declaratively define data transforms:

  • convtools.contrib.models - data validation based on typing - experimental

  • convtools.contrib.tables - stream processing of table-like data (e.g. CSV)

  • convtools.conversion - pipelines for processing collections, doing complex aggregations and joins.

https://img.shields.io/pypi/pyversions/convtools.svg https://img.shields.io/github/license/westandskif/convtools.svg https://codecov.io/gh/westandskif/convtools/branch/master/graph/badge.svg Tests Status Documentation Status https://img.shields.io/github/tag/westandskif/convtools.svg https://badge.fury.io/py/convtools.svg Twitter URL

Docs

Why would you need this?

  • you prefer declarative approach

  • you love functional programming

  • you believe that Python is high-level enough not make you write aggregations and joins by hand

  • you need to serialize/validate objects

  • you need to dynamically define transforms (including at runtime)

  • you like the idea of having something write ad hoc code for you

Installation:

pip install convtools

What’s the workflow?

Contrib / Model - data validation (experimental)

import typing as t
from enum import Enum

from convtools.contrib.models import DictModel, build, cast, json_dumps

T = t.TypeVar("T")

class Countries(Enum):
    MX = "MX"
    BR = "BR"


class AddressModel(DictModel):
    country: Countries = cast()  # explicit casting to output type
    state: str                   # validation only
    city: t.Optional[str]
    street: t.Optional[str] = None

    # # in case of a custom path like: address["apt"]["number"]
    # apt: int = field("apt", "number").cast()


class UserModel(DictModel):
    name: str
    age: int = cast()
    addresses: t.List[AddressModel]


class ResponseModel(DictModel, t.Generic[T]):
    data: T


input_data = {
    "data": [
        {
            "name": "John",
            "age": "21",
            "addresses": [{"country": "BR", "state": "SP", "city": "São Paulo"}],
        }
    ]
}
obj, errors = build(ResponseModel[t.List[UserModel]], input_data)

In [4]: obj
Out[4]: ResponseModel(data=[
            UserModel(name='John', age=21, addresses=[
                AddressModel(country=<Countries.BR: 'BR'>, state='SP', city='São Paulo', street=None)])])

In [5]: obj.data[0].addresses[0].country
Out[5]: <Countries.BR: 'BR'>

In [6]: obj.to_dict()
Out[6]:
{'data': [{'name': 'John',
   'age': 21,
   'addresses': [{'country': <Countries.BR: 'BR'>,
     'state': 'SP',
     'city': 'São Paulo',
     'street': None}]}]}

In [7]: json_dumps(obj)
Out[7]: '{"data": [{"name": "John", "age": 21, "addresses": [{"country": "BR", "state": "SP", "city": "S\\u00e3o Paulo", "street": null}]}]}'
# LET'S BREAK THE DATA AND VALIDATE AGAIN:
input_data["data"][0]["age"] = 21.1
obj, errors = build(ResponseModel[t.List[UserModel]], input_data)

In [5]: errors
Out[5]: {'data': {0: {'age': {'__ERRORS': {'int_caster': 'losing fractional part: 21.1; if desired, use casters.IntLossy'}}}}}

Contrib / Table - stream processing of table-like data

Table helper allows to massage CSVs and table-like data:
  • join / zip / chain tables

  • take / drop / rename columns

  • filter rows

  • update / update_all values

from convtools.contrib.tables import Table
from convtools import conversion as c

# reads Iterable of rows
Table.from_rows(
    [(0, -1), (1, 2)],
    header=["a", "b"]
).join(
    Table
    # reads tab-separated CSV file
    .from_csv("tests/csvs/ac.csv", header=True, dialect=Table.csv_dialect(delimiter="\t"))
    # casts all column values to int
    .update_all(int)
    # filter rows by condition (convtools conversion)
    .filter(c.col("c") >= 0),
    # joins on column "a" values
    on=["a"],
    how="inner",
).into_iter_rows(dict)  # this is a generator to consume (tuple, list are supported to)

Conversions - data transforms, complex aggregations, joins:

# pip install convtools

from convtools import conversion as c

input_data = [{"StoreID": " 123", "Quantity": "123"}]

# define a conversion (sometimes you may want to do this dynamically)
#  takes iterable and returns iterable of dicts, stopping before the first
#  one with quantity >= 1000, splitting into chunks of size = 1000
conversion = (
    c.iter(
        {
            "id": c.item("StoreID").call_method("strip"),
            "quantity": c.item("Quantity").as_type(int),
        }
    )
    .take_while(c.item("quantity") < 1000)
    .pipe(
        c.chunk_by(c.item("id"), size=1000)
    )
    .as_type(list)
    .gen_converter(debug=True)
)

# compile the conversion into an ad hoc function and run it
converter = conversion.gen_converter()
converter(input_data)

# OR in case of a one-shot use
conversion.execute(input_data)
from convtools import conversion as c


def test_doc__index_intro():

    # ======== #
    # GROUP BY #
    # ======== #
    input_data = [
        {"a": 5, "b": "foo"},
        {"a": 10, "b": "foo"},
        {"a": 10, "b": "bar"},
        {"a": 10, "b": "bar"},
        {"a": 20, "b": "bar"},
    ]

    conv = (
        c.group_by(c.item("b"))
        .aggregate(
            {
                "b": c.item("b"),
                "a_first": c.ReduceFuncs.First(c.item("a")),
                "a_max": c.ReduceFuncs.Max(c.item("a")),
            }
        )
        .gen_converter(debug=True)
    )

    assert conv(input_data) == [
        {"b": "foo", "a_first": 5, "a_max": 10},
        {"b": "bar", "a_first": 10, "a_max": 20},
    ]

    # ========= #
    # AGGREGATE #
    # ========= #
    conv = c.aggregate(
        {
            # list of "a" values where "b" equals to "bar"
            "a": c.ReduceFuncs.Array(c.item("a"), where=c.item("b") == "bar"),
            # "b" value of a row where "a" has Max value
            "b": c.ReduceFuncs.MaxRow(
                c.item("a"),
            ).item("b", default=None),
        }
    ).gen_converter(debug=True)

    assert conv(input_data) == {"a": [10, 10, 20], "b": "bar"}

    # ==== #
    # JOIN #
    # ==== #
    collection_1 = [
        {"id": 1, "name": "Nick"},
        {"id": 2, "name": "Joash"},
        {"id": 3, "name": "Bob"},
    ]
    collection_2 = [
        {"ID": "3", "age": 17, "country": "GB"},
        {"ID": "2", "age": 21, "country": "US"},
        {"ID": "1", "age": 18, "country": "CA"},
    ]
    input_data = (collection_1, collection_2)

    conv = (
        c.join(
            c.item(0),
            c.item(1),
            c.and_(
                c.LEFT.item("id") == c.RIGHT.item("ID").as_type(int),
                c.RIGHT.item("age") >= 18,
            ),
            how="left",
        )
        .pipe(
            c.list_comp(
                {
                    "id": c.item(0, "id"),
                    "name": c.item(0, "name"),
                    "age": c.item(1, "age", default=None),
                    "country": c.item(1, "country", default=None),
                }
            )
        )
        .gen_converter(debug=True)
    )

    assert conv(input_data) == [
        {"id": 1, "name": "Nick", "age": 18, "country": "CA"},
        {"id": 2, "name": "Joash", "age": 21, "country": "US"},
        {"id": 3, "name": "Bob", "age": None, "country": None},
    ]

What reducers are supported by aggregations?

Any reduce function of two arguments you pass in c.reduce OR the following ones, exposed like c.ReduceFuncs.Sum:

  1. Sum

  2. SumOrNone

  3. Max

  4. MaxRow

  5. Min

  6. MinRow

  7. Count

  8. CountDistinct

  9. First

  10. Last

  11. Average

  12. Median

  13. Percentile - c.ReduceFuncs.Percentile(95.0, c.item("x"))

  14. Mode

  15. TopK - c.ReduceFuncs.TopK(3, c.item("x"))

  16. Array

  17. ArrayDistinct

  18. ArraySorted - c.ReduceFuncs.ArraySorted(c.item("x"), key=lambda v: v, reverse=True)

  19. Dict - c.ReduceFuncs.Dict(c.item("key"), c.item("x"))

  20. DictArray

  21. DictSum

  22. DictSumOrNone

  23. DictMax

  24. DictMin

  25. DictCount

  26. DictCountDistinct

  27. DictFirst

  28. DictLast

Is it any different from tools like Pandas / Polars?

  • convtools doesn’t wrap data in any container, it just writes and runs the code which perform the conversion you defined

  • convtools is a lightweight library with no dependencies (however optional black is highly recommended for pretty-printing generated code when debugging)

  • convtools is about defining and reusing conversions – declarative approach, while wrapping data in high-performance containers is more of being imperative

  • convtools supports nested aggregations

Is this thing debuggable?

Despite being compiled at runtime, it is, by both pdb and pydevd

Docs

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

convtools-0.26.0.tar.gz (70.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

convtools-0.26.0-py3-none-any.whl (73.6 kB view details)

Uploaded Python 3

File details

Details for the file convtools-0.26.0.tar.gz.

File metadata

  • Download URL: convtools-0.26.0.tar.gz
  • Upload date:
  • Size: 70.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for convtools-0.26.0.tar.gz
Algorithm Hash digest
SHA256 d1572422348887f3c7c734f38b1d4b38ab9ca1007ab6d9661c68ecfc8f874024
MD5 7ec79151ed2e39b56f2f3d850f2ef88d
BLAKE2b-256 9b9f3176e422121bb74eef84e008057185827f35b44db3bbc3a1702341531cae

See more details on using hashes here.

File details

Details for the file convtools-0.26.0-py3-none-any.whl.

File metadata

  • Download URL: convtools-0.26.0-py3-none-any.whl
  • Upload date:
  • Size: 73.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for convtools-0.26.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6da84c02ededcdd55877b456f391fad176a51c820b0968ad8b9801a72ebc5db4
MD5 b616f103fc6a6e65aa2a1cc6e9c32d23
BLAKE2b-256 1c3b4d55a4fc02934eef1550425b3655c8b7ab2bae55df1b909f7461acf7813d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page