Skip to main content

Relation tools for Python.

Project description

https://circleci.com/gh/ymoch/reltools.svg?style=svg https://codecov.io/gh/ymoch/reltools/branch/master/graph/badge.svg https://badge.fury.io/py/reltools.svg https://img.shields.io/badge/python-3.6+-blue.svg https://img.shields.io/lgtm/grade/python/g/ymoch/reltools.svg

Relation tools for Python. This relates two data (sorted by certain keys) like SQL joining.

Inspired by itertools.groupby, as long as input data are sorted, almost all processes are evaluated lazily, which results in the reduction of memory usage. This feature is for the big data joining without any SQL engines.

Installation

Install with pip.

pip install reltools

Features

One-To-Many

One-to-many relationing is provided by relate_one_to_many.

Here, input left-hand-side (lhs) and right-hand-side (rhs) are sorted in 1st (and also 2nd) keys.

>>> lhs = [
...     (1, 'a', 's'),
...     (2, 'a', 't'),
...     (3, 'b', 'u'),
... ]
>>> rhs = [
...     (1, 'a', 'v'),
...     (1, 'b', 'w'),
...     (3, 'b', 'x'),
... ]

relate_one_to_many relates rhs items to each lhs item using the first items as keys.

>>> from reltools import relate_one_to_many
>>> one_to_many_related = relate_one_to_many(lhs, rhs)
>>> for left, right in one_to_many_related:
...     left, list(right)
((1, 'a', 's'), [(1, 'a', 'v'), (1, 'b', 'w')])
((2, 'a', 't'), [])
((3, 'b', 'u'), [(3, 'b', 'x')])

You can use custom key functions for not only relate_one_to_many but also API functions.

>>> import operator
>>> custom_key = operator.itemgetter(0, 1)
>>> one_to_many_related = relate_one_to_many(
...     lhs, rhs, lhs_key=custom_key, rhs_key=custom_key)
>>> for left, right in one_to_many_related:
...     left, list(right)
((1, 'a', 's'), [(1, 'a', 'v')])
((2, 'a', 't'), [])
((3, 'b', 'u'), [(3, 'b', 'x')])

OneToManyChainer helps to relate many rhs iterables to lhs.

>>> another_rhs = [
...     ('s', 'f'),
...     ('t', 'g'),
...     ('t', 'h'),
... ]
>>> from reltools import OneToManyChainer
>>> chainer = OneToManyChainer(lhs)
>>> chainer.append(rhs)
>>> chainer.append(
...     another_rhs,
...     lhs_key=operator.itemgetter(2),
...     rhs_key=operator.itemgetter(0),
... )
>>> for left, right, another_right in chainer.chain():
...     left, list(right), list(another_right)
((1, 'a', 's'), [(1, 'a', 'v'), (1, 'b', 'w')], [('s', 'f')])
((2, 'a', 't'), [], [('t', 'g'), ('t', 'h')])
((3, 'b', 'u'), [(3, 'b', 'x')], [])

Left Outer Join

Left outer joining is provided by left_join. While SQL left outer joining returns all the combinations, this returns the pair of items. Note that the right can empty, like SQL left joining.

>>> from reltools import left_join
>>> lhs = [(1, 'a'), (1, 'b'), (2, 'c'), (4, 'd')]
>>> rhs = [(1, 's'), (1, 't'), (3, 'u'), (4, 'v')]
>>> relations = left_join(lhs, rhs)
>>> for left, right in relations:
...     list(left), list(right)
([(1, 'a'), (1, 'b')], [(1, 's'), (1, 't')])
([(2, 'c')], [])
([(4, 'd')], [(4, 'v')])

Right Outer Join

Right outer joining is not supported because it is left-and-right-opposite of left joining. Use left_join(rhs, lhs, rhs_key, lhs_key).

Full Outer Join

Full outer joining, which is an original feature of reltools, is provided by outer_join. In contrast to left_join, full outer joining preserve keys that are only in rhs.

>>> from reltools import outer_join
>>> lhs = [(1, 'a'), (1, 'b'), (2, 'c'), (4, 'd')]
>>> rhs = [(1, 's'), (1, 't'), (3, 'u'), (4, 'v')]
>>> relations = outer_join(lhs, rhs)
>>> for left, right in relations:
...     list(left), list(right)
([(1, 'a'), (1, 'b')], [(1, 's'), (1, 't')])
([(2, 'c')], [])
([], [(3, 'u')])
([(4, 'd')], [(4, 'v')])

Inner Join

Inner joining is provided by inner_join. In contrast to left_join, right cannot be empty, like SQL inner joining.

>>> from reltools import inner_join
>>> relations = inner_join(lhs, rhs)
>>> for left, right in relations:
...     list(left), list(right)
([(1, 'a'), (1, 'b')], [(1, 's'), (1, 't')])
([(4, 'd')], [(4, 'v')])

Many-To-Many

SQL-like many-to-many relationing using an internal table is not supported. This is because reltools supports only sorted data and does not prefer random accessing. To achieve many-to-many relationing, unnormalize data on preproceing and use outer joining or inner joining.

Memory Efficiency

Almost all processes are evaluated lazily, which results in the reduction of memory usage. (You can try the efficiency by commands like RELTOOLS_TRY_COUNT=10000000 python3 -m doctest README.rst)

>>> import os
>>> n = int(os.environ.get('RELTOOLS_TRY_COUNT', 1000))
>>> lhs = ((i, 'left') for i in range(n))
>>> rhs = ((i, 'right') for i in range(n))
>>> for left, right in relate_one_to_many(lhs, rhs):
...     assert len(list(right)) == 1

Development

This project’s structure is based on Poetry. All tests are written with doctest and run with pytest.

poetry install
poetry run pytest

For stability, following checks are also run when testing.

License

https://img.shields.io/badge/License-MIT-brightgreen.svg

Copyright (c) 2018 Yu MOCHIZUKI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reltools-1.0.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

reltools-1.0.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file reltools-1.0.0.tar.gz.

File metadata

  • Download URL: reltools-1.0.0.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.6.12 Linux/4.15.0-1077-aws

File hashes

Hashes for reltools-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b457a6fc8a99dd05313e389f556115b1e0049b11ca8a43a1d2eff731349a9476
MD5 8eac4561ed256a24de67fb921e0b9263
BLAKE2b-256 a6fc175132cf80417d5eb6a2a703548842ce15dd0bf470a98d724afe59eefc35

See more details on using hashes here.

File details

Details for the file reltools-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: reltools-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.4 CPython/3.6.12 Linux/4.15.0-1077-aws

File hashes

Hashes for reltools-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8783c007c896cf8b8fe97367a78d9deb3dec4d142ce62470c066363945e58ff2
MD5 3d8ab6b3c4716a7749923a8b83f53686
BLAKE2b-256 afbb25724cc6e94e0acc66ac6a39f97af6663c4ead9e1871eca49049e3b03dad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page