Lazy iterables for full outer join, inner join, and left and right join
Project description
full_outer_join
Lazy iterator implementations of a full outer join, inner join, and left and right joins of Python iterables.
This implements the sort-merge join, better known as the merge join, to join iterables in O(n) time with respect to the length of the longest iterable.
Note that the algorithm requires input to be sorted by the join key.
Example
(whitespace to make things explicit)
>>> list(full_outer_join.full_outer_join(
[{"id": 1, "val": "foo"} ],
[{"id": 1, "val": "bar"}, {"id": 2, "val": "baz"}],
key=lambda x: x["id"]
))
[
(1, ([{'id': 1, 'val': 'foo'}], [{'id': 1, 'val': 'bar'}])),
(2, ([ ], [{'id': 2, 'val': 'baz'}]))
]
To consume the output, your business logic might look like:
for group_key, key_batches in full_outer_join.full_outer_join(left, right):
left_rows, right_rows = key_batches
if left_rows and right_rows:
# This is the inner join case.
pass
elif left_rows and not right_rows:
# This is the left join case (no matching right rows)
pass
elif not left_rows and right_rows:
# This is the right join case (no matching left rows)
pass
elif not left_rows and not right_rows:
raise Exception("Unreachable")
Functions
name | description |
---|---|
full_outer_join(*iterables, key=lambda x: x) |
Do a full outer join on any number of iterables, returning (key, (list[row], ...)) for each key across all iterables. |
inner_join(*iterables, key=lambda x: x) |
Do an inner join across all iterables, returning (key, (list[row], ...)) for keys only in all iterables |
left_join(left_iterable, right_iterable, key=lambda x: x) |
Do a left join on both iterables, returning keys for each unique key in left_iterable |
right_join(left_iterable, right_iterable, key=lambda x: x) |
Do a right join on both iterables, returning keys for each unique key in right_iterable |
cross_join(join_output, null=None) |
Do the cross (Cartesian) join on the output of full_outer_join or inner_join , yielding (key, (iter1_row, ...)) for each row. This is implemented for completeness and is probably not useful. Iterables lacking any rows for key are replaced with null in the output. |
Why?
- Your input is already sorted and you don't want to consume your input iterators.
- Your business logic that consumes the joined output benefits from explicitly handling the match and no-match cases from each input iterable.
- You're insane. Your brain is irreparably broken by the relational model.
More examples
See test_insanity.py for a silly example of a SQL query hand-compiled into iterators.
Thanks
This was originally a PR to the more_itertools project who gave some excellent feedback on the design but ultimately did not want to merge it in.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file full_outer_join-1.0.0.tar.gz
.
File metadata
- Download URL: full_outer_join-1.0.0.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f8a22b73e6d30070f48522d2a01d56f64c6c0411d4468cc88854ce969c43e8e1 |
|
MD5 | 48417c518a19b649f12a3d55545cf21d |
|
BLAKE2b-256 | fe02ff8350797577527ed7868070f8a689d7a3ad7a259f7752820f72d85edba9 |
File details
Details for the file full_outer_join-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: full_outer_join-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5c02dc0b11f4380f6c5645c2737190c49e4e6e5b9fafec2ea01c72d367b92d8 |
|
MD5 | c0463c5591fb84dd900484c35c2e766c |
|
BLAKE2b-256 | a70bf71655666bd21eec2256a3a9c265e5491d43f6697caf313b9795301256ec |