Skip to main content

itertools with function chaining

Project description

https://github.com/cjrh/excitertools/workflows/Python%20application/badge.svg https://coveralls.io/repos/github/cjrh/excitertools/badge.svg?branch=master https://img.shields.io/pypi/pyversions/excitertools.svg https://img.shields.io/github/tag/cjrh/excitertools.svg https://img.shields.io/badge/install-pip%20install%20excitertools-ff69b4.svg https://img.shields.io/pypi/v/excitertools.svg https://img.shields.io/badge/calver-YYYY.MM.MINOR-22bfda.svg https://img.shields.io/badge/code%20style-black-000000.svg

excitertools

itertools in the form of function call chaining

API Documentation

Several emoji are used to indicate things about parts of the API:

  • 🎤 This API method is a source, meaning that it produces data that will be processed in an iterator chain.

  • 🎧 This API method is a sink, meaning that it consumes data that was processed in an iterator chain.

  • ⚠ Warning - pay attention

  • 🛠 This API is still in flux, and might be changed or removed in the future

  • ✨ Noteworthy; could be especially useful in many situations.

The API is arranged roughly with the module-level functions first, and thereafter the Iter class itself. It is the Iter class that does the work to allow these iterators to be chained together. However, the module-level functions are more likely to be used directly and that’s why they’re presented first.

The API includes wrappers for the stdlib itertools module, including the “recipes” given in the itertools docs, as well as wrappers for the iterators from the more-itertools 3rd-party package.


The following module-level functions, like range, zip and so on, are intended to be used as replacements for their homonymous builtins. The only difference between these and the builtin versions is that these return instances of the Iter class. Note that because Iter is itself iterable, it means that the functions here can be used as drop-in replacements.

🎤 range(*args) -> "Iter[int]"

Replacement for the builtin range function. This version returns an instance of Iter to allow further iterable chaining.

All the same calling variations work because this function merely wraps the original function.

>>> range(3).collect()
[0, 1, 2]
>>> range(1, 4).collect()
[1, 2, 3]
>>> range(1, 6, 2).collect()
[1, 3, 5]
>>> range(1, 101, 3).filter(lambda x: x % 7 == 0).collect()
[7, 28, 49, 70, 91]

This example multiples, element by element, the series [0:5] with the series [1:6]. Two things to note: Firstly, Iter.zip is used to emit the tuples from each series. Secondly, Iter.starmap is used to receive those tuples into separate arguments in the lambda.

>>> range(5).zip(range(1, 6)).starmap(lambda x, y: x * y).collect()
[0, 2, 6, 12, 20]

When written in a single line as above, it can get difficult to follow the chain of logic if there are many processing steps. Parentheses in Python allow grouping such that expressions can be spread over multiple lines.

This is the same example as the prior one, but formatted to be spread over several lines. This is much clearer:

>>> # Written out differently
>>> (
...     range(5)
...         .zip(range(1, 6))
...         .starmap(lambda x, y: x * y)
...         .collect()
... )
[0, 2, 6, 12, 20]

If you wanted the sum instead, it isn’t necessary to do the collection at all:

>>> (
...     range(5)
...         .zip(range(1, 6))
...         .starmap(lambda x, y: x * y)
...         .sum()
... )
40

zip(*iterables: Any) -> "Iter[Tuple[T, ...]]"

Replacement for the builtin zip function. This version returns an instance of Iter to allow further iterable chaining.

enumerate(iterable) -> "Iter[Tuple[int, T]]"

Replacement for the builtin enumerate function. This version returns an instance of Iter to allow further iterable chaining.

>>> enumerate(string.ascii_lowercase).take(3).collect()
[(0, 'a'), (1, 'b'), (2, 'c')]

map(func: Union[Callable[..., C], str], iterable) -> "Iter[C]"

Replacement for the builtin map function. This version returns an instance of Iter to allow further iterable chaining.

>>> result = map(lambda x: (x, ord(x)), 'caleb').dict()
>>> assert result == {'a': 97, 'b': 98, 'c': 99, 'e': 101, 'l': 108}

>>> result = map('x, ord(x)', 'caleb').dict()
>>> assert result == {'a': 97, 'b': 98, 'c': 99, 'e': 101, 'l': 108}

filter(function: "Callable[[Any], bool]", iterable: "Iterable[T]") -> "Iter[T]"

Replacement for the builtin filter function. This version returns an instance of Iter to allow further iterable chaining.

>>> filter(lambda x: x % 3 == 0, range(10)).collect()
[0, 3, 6, 9]

🎤 count(start=0, step: int = 1) -> "Iter[int]"

Replacement for the itertools count function. This version returns an instance of Iter to allow further iterable chaining.

>>> count().take(5).collect()
[0, 1, 2, 3, 4]
>>> count(0).take(0).collect()
[]
>>> count(10).take(0).collect()
[]
>>> count(10).take(5).collect()
[10, 11, 12, 13, 14]
>>> count(1).filter(lambda x: x > 10).take(5).collect()
[11, 12, 13, 14, 15]

cycle(iterable) -> "Iter[T]"

Replacement for the itertools count function. This version returns an instance of Iter to allow further iterable chaining.

>>> cycle(range(3)).take(6).collect()
[0, 1, 2, 0, 1, 2]
>>> cycle([]).take(6).collect()
[]
>>> cycle(range(3)).take(0).collect()
[]

🎤 repeat(object: C, times=None) -> "Iter[C]"

Replacement for the itertools count function. This version returns an instance of Iter to allow further iterable chaining.

>>> repeat('a').take(3).collect()
['a', 'a', 'a']
>>> repeat([1, 2]).take(3).collect()
[[1, 2], [1, 2], [1, 2]]
>>> repeat([1, 2]).take(3).collapse().collect()
[1, 2, 1, 2, 1, 2]
>>> repeat([1, 2]).collapse().take(3).collect()
[1, 2, 1]
>>> repeat('a', times=3).collect()
['a', 'a', 'a']

This next set of functions return iterators that terminate on the shortest input sequence.

accumulate(iterable, func=None, *, initial=None)

Replacement for the itertools accumulate function. This version returns an instance of Iter to allow further iterable chaining.

>>> accumulate([1, 2, 3, 4, 5]).collect()
[1, 3, 6, 10, 15]
>>> if sys.version_info >= (3, 8):
...     output = accumulate([1, 2, 3, 4, 5], initial=100).collect()
...     assert output == [100, 101, 103, 106, 110, 115]
>>> accumulate([1, 2, 3, 4, 5], operator.mul).collect()
[1, 2, 6, 24, 120]
>>> accumulate([]).collect()
[]
>>> accumulate('abc').collect()
['a', 'ab', 'abc']
>>> accumulate(b'abc').collect()
[97, 195, 294]
>>> accumulate(bytearray(b'abc')).collect()
[97, 195, 294]

chain(*iterables: Iterable[T]) -> "Iter[T]"

Replacement for the itertools chain function. This version returns an instance of Iter to allow further iterable chaining.

>>> chain('ABC', 'DEF').collect()
['A', 'B', 'C', 'D', 'E', 'F']
>>> chain().collect()
[]

chain_from_iterable(iterable) -> "Iter[T]"

Replacement for the itertools chain.from_iterable method. This version returns an instance of Iter to allow further iterable chaining.

>>> chain_from_iterable(['ABC', 'DEF']).collect()
['A', 'B', 'C', 'D', 'E', 'F']
>>> chain_from_iterable([]).collect()
[]

compress(data, selectors)

Replacement for the itertools compress function. This version returns an instance of Iter to allow further iterable chaining.

>>> compress('ABCDEF', [1, 0, 1, 0, 1, 1]).collect()
['A', 'C', 'E', 'F']

dropwhile(pred, iterable)

Replacement for the itertools dropwhile function. This version returns an instance of Iter to allow further iterable chaining.

>>> dropwhile(lambda x: x < 4, range(6)).collect()
[4, 5]

filterfalse(pred, iterable)

Replacement for the itertools filterfalse function. This version returns an instance of Iter to allow further iterable chaining.

>>> filterfalse(None, [2, 0, 3, None, 4, 0]).collect()
[0, None, 0]

groupby(iterable, key=None)

Replacement for the itertools groupby function. This version returns an instance of Iter to allow further iterable chaining.

groupby returns an iterator of a key and “grouper” iterable. In the example below, we use Iter.starmap to collect each grouper iterable into a list, as this makes it neater for display here in the docstring.

>>> (
...     groupby(['john', 'jill', 'anne', 'jack'], key=lambda x: x[0])
...         .starmap(lambda k, g: (k, list(g)))
...         .collect()
... )
[('j', ['john', 'jill']), ('a', ['anne']), ('j', ['jack'])]

islice(iterable, *args) -> "Iter"

Replacement for the itertools islice function. This version returns an instance of Iter to allow further iterable chaining.

>>> islice('ABCDEFG', 2).collect()
['A', 'B']
>>> islice('ABCDEFG', 2, 4).collect()
['C', 'D']
>>> islice('ABCDEFG', 2, None).collect()
['C', 'D', 'E', 'F', 'G']
>>> islice('ABCDEFG', 0, None, 2).collect()
['A', 'C', 'E', 'G']

starmap(func, iterable)

Replacement for the itertools starmap function. This version returns an instance of Iter to allow further iterable chaining.

>>> starmap(pow, [(2, 5), (3, 2), (10, 3)]).collect()
[32, 9, 1000]

takewhile(pred, iterable)

Replacement for the itertools takewhile function. This version returns an instance of Iter to allow further iterable chaining.

>>> takewhile(lambda x: x < 5, [1, 4, 6, 4, 1]).collect()
[1, 4]

tee(iterable, n=2)

Replacement for the itertools tee function. This version returns an instance of Iter to allow further iterable chaining.

>>> a, b = tee(range(5))
>>> a.collect()
[0, 1, 2, 3, 4]
>>> b.sum()
10

It is also possible to operate on the returned iterators in the chain but it gets quite difficult to understand:

>>> tee(range(5)).map(lambda it: it.sum()).collect()
[10, 10]

In the example above we passed in range, but with excitertools it’s usually more natural to push data sources further left:

>>> range(5).tee().map(lambda it: it.sum()).collect()
[10, 10]

Pay close attention to the above. The map is acting on each of the copied iterators.

zip_longest(*iterables, fillvalue=None)

Replacement for the itertools zip_longest function. This version returns an instance of Iter to allow further iterable chaining.

>>> zip_longest('ABCD', 'xy', fillvalue='-').collect()
[('A', 'x'), ('B', 'y'), ('C', '-'), ('D', '-')]
>>> (
...     zip_longest('ABCD', 'xy', fillvalue='-')
...         .map(lambda tup: concat(tup, ''))
...         .collect()
... )
['Ax', 'By', 'C-', 'D-']
>>> (
...     zip_longest('ABCD', 'xy', fillvalue='-')
...         .starmap(operator.add)
...         .collect()
... )
['Ax', 'By', 'C-', 'D-']

finditer_regex(pat: "re.Pattern[AnyStr]", s: AnyStr, flags: Union[int, re.RegexFlag] = 0) -> "Iter[AnyStr]"

Wrapper for re.finditer. Returns an instance of Iter to allow chaining.

>>> pat = r"\w+"
>>> text = "Well hello there! How ya doin!"
>>> finditer_regex(pat, text).map(str.lower).filter(lambda w: 'o' in w).collect()
['hello', 'how', 'doin']
>>> finditer_regex(r"[A-Za-z']+", "A programmer's RegEx test.").collect()
['A', "programmer's", 'RegEx', 'test']
>>> finditer_regex(r"[A-Za-z']+", "").collect()
[]
>>> finditer_regex("", "").collect()
['']
>>> finditer_regex("", "").filter(None).collect()
[]

splititer_regex(pat: "re.Pattern[AnyStr]", s: AnyStr, flags: Union[int, re.RegexFlag] = 0) -> "Iter[AnyStr]"

Lazy string splitting using regular expressions.

Most of the time you want str.split. Really! That will almost always be fastest. You might think that str.split is inefficient because it always has to build a list, but it can do this very, very quickly.

The lazy splitting shown here is more about supporting a particular kind of programming model, rather than performance.

See more discussion here.

>>> splititer_regex(r"\s", "A programmer's RegEx test.").collect()
['A', "programmer's", 'RegEx', 'test.']

Note that splitting at a single whitespace character will return blanks for each found. This is different to how str.split() works.

>>> splititer_regex(r"\s", "aaa     bbb  \n  ccc\nddd\teee").collect()
['aaa', '', '', '', '', 'bbb', '', '', '', '', 'ccc', 'ddd', 'eee']

To match str.split(), specify a sequence of whitespace as the regex pattern.

>>> splititer_regex(r"\s+", "aaa     bbb  \n  ccc\nddd\teee").collect()
['aaa', 'bbb', 'ccc', 'ddd', 'eee']

Counting the whitespace

>>> splititer_regex(r"\s", "aaa     bbb  \n  ccc\nddd\teee").collect(Counter)
Counter({'': 8, 'aaa': 1, 'bbb': 1, 'ccc': 1, 'ddd': 1, 'eee': 1})

Lazy splitting at newlines

>>> splititer_regex(r"\n", "aaa     bbb  \n  ccc\nddd\teee").collect()
['aaa     bbb  ', '  ccc', 'ddd\teee']
>>> splititer_regex(r"", "aaa").collect()
['', 'a', 'a', 'a', '']
>>> splititer_regex(r"", "").collect()
['', '']
>>> splititer_regex(r"\s", "").collect()
['']
>>> splititer_regex(r"a", "").collect()
['']
>>> splititer_regex(r"\s", "aaa").collect()
['aaa']

class Iter(Generic[T])

This class is what allows chaining. Many of the methods in this class return an instance of Iter, which allows further chaining. There are two exceptions to this: sources and sinks.

A “source” is usually a classmethod which can be used as an initializer to produce data via an iterable. For example, the Iter.range classmethod can be used to get a sequence of numbers:

>>> Iter.range(1_000_000).take(3).collect()
[0, 1, 2]

Even though our range was a million elements, the iterator chaining took only 3 of those elements before collecting.

A “sink” is a method that is usually the last component of a processing chain and often (but not always!) consumes the entire iterator. In the example above, the call to Iter.collect was a sink. Note that we still call it a sink even though it did not consume the entire iterator.

We’re using the term “source” to refer to a classmethod of Iter that produces data; but, the most typical source is going to be data that you provide. Iter can be called with anything that is iterable, including sequences, iterators, mappings, sets, generators and so on.

Examples:

List
>>> Iter([1, 2, 3]).map(lambda x: x * 2).sum()
12

Generator
>>> Iter((1, 2, 3)).map(lambda x: x * 2).sum()
12
>>> def g():
...     for i in [1, 2, 3]:
...         yield i
>>> Iter(g()).map(lambda x: x * 2).sum()
12

Iterator
>>> Iter(iter([1, 2, 3])).map(lambda x: x * 2).sum()
12

Dict
>>> Iter(dict(a=1, b=2)).map(lambda x: x.upper()).collect()
['A', 'B']
>>> d = dict(a=1, b=2, c=3)
>>> Iter(d.items()).starmap(lambda k, v: v).map(lambda x: x * 2).sum()
12

A common error with generators is forgetting to actually evaluate, i.e., call a generator function. If you do this there’s a friendly error pointing out the mistake:

>>> def mygen(): yield 123
>>> Iter(mygen).collect()
Traceback (most recent call last):
    ...
TypeError: It seems you passed a generator function, but you
probably intended to pass a generator. Remember to evaluate the
function to obtain a generator instance:
<BLANKLINE>
def mygen():
    yield 123
<BLANKLINE>
Iter(mygen)    # ERROR - a generator function object is not iterable
Iter(mygen())  # CORRECT - a generator instance is iterable.
>>> Iter(mygen()).collect()
[123]

Instance of Iter are resumable. Once an instance it created, it can be partially iterated in successive calls, like the following example shows:

>>> it = Iter.range(1_000_000)
>>> it.take(3).collect()
[0, 1, 2]
>>> it.take(4).collect()
[3, 4, 5, 6]
>>> # Consume most of the stream, collect the last few
>>> it.consume(999_990).collect()
[999997, 999998, 999999]

This class implements the chaining. However, the module-level functions in excitertools, such as range, zip and so on, also return instances of Iter, so they allow the chaining to continue. These are equivalent:

>>> Iter.range(10).filter(lambda x: x > 7).collect()
[8, 9]
>>> range(10).filter(lambda x: x > 7).collect()
[8, 9]

It is intended that the module-level functions can act as drop-in replacements for the builtins they wrap:

>>> import builtins
>>> list(builtins.range(3))
[0, 1, 2]
>>> list(range(3))  # This is excitertools.range!
[0, 1, 2]
>>> list(Iter.range(3))
[0, 1, 2]

In your own code where you might like to use the excitertools version of range and the other functions, you can just import it and use it to access all the other cool stuff:

# mymodule.py
from excitertools import (
    range,
    map,
    filter,
    reduce,
    repeat,
    count,
    enumerate,
    zip,
    ...
)

def func(inputs):
    data = (
        map(lambda x: x + 2, inputs)
            .enumerate()
            .filter(lambda x: x[1] > 10)
            ...
            .collect()

    )

Alternatively, if you don’t want to hide the builtins you can do just fine with importing this class only, or even importing the module only:

# mymodule.py - same example as before
import excitertools

def func(inputs):
    data = (
        excitertools.Iter(inputs)
            .map(lambda x: x + 2, inputs)
            .enumerate()
            .filter(lambda x: x[1] > 10)
            ...
            .collect()
    )

    # Do something with data

There are several valuable additions to the standard itertools and more-itertools functions. These usually involve sources and sinks, which are ways of getting data into an iterator pipeline, and then getting results out again. In the majority of documentation examples shown here, the Iter.collect method is used to collect all the remaining data on a stream into a list; but in practice this is not useful because large lists consume memory.

In practice it is more useful to send iterator data to one of these common sinks:

  • files

  • sockets

  • queues

  • HTTP APIs

  • Cloud storage buckets

  • (Ideas for more to add here?)

Iter has support for these use-cases, both for reading and for writing.

🎧 Iter.collect(self, container=list) -> "List[T]"

This is the most common way of “realizing” an interable chain into a concrete data structure. It should be the case that this is where most of the memory allocation occurs.

The default container is a list and you’ll see throughout this documentation that most examples produce lists. However, any container, and indeed any function, can be used as the sink.

The basic example:

>>> Iter(range(3)).collect()
[0, 1, 2]
>>> Iter(range(3)).collect(tuple)
(0, 1, 2)

You must pay attention to some things. For example, if your iterable is a string, the characters of the string are what get iterated over, and when you collect you’ll get a collection of those atoms. You can however use str as your “container function” and that will give you back a string. It’s like a join with blank joiner.

>>> Iter('abc').collect()
['a', 'b', 'c']
>>> Iter('abc').collect(str)
'abc'

With some types, things get a little more tricky. Take bytes for example:

>>> Iter(b'abc').collect()
[97, 98, 99]

You probably didn’t expect to get the integers back right? Anyhow, you can use bytes as the “collection container”, just like we did with strings and that will work:

>>> Iter(b'abc').collect(bytes)
b'abc'
>>> Iter(b'abc').collect(bytearray)
bytearray(b'abc')

The other standard collections also work, here’s a set for completeness.

>>> Iter('abcaaaabbbbccc').collect(set) == {'a', 'b', 'c'}
True

✨ 🎤 @classmethod Iter.open(cls, file, mode="r", buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None, ) -> "Iter"

Wrap the open() builtin precisely, but return an Iter instance to allow function chaining on the result.

I know you’re thinking that we should always use a context manager for files. Don’t worry, there is one being used internally. When the iterator chain is terminated the underlying file will be closed.

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as td:
...     # Put some random text into a temporary file
...     with open(td + 'text.txt', 'w') as f:
...         f.writelines(['abc\n', 'def\n', 'ghi\n'])
...
...     # Open the file, filter some lines, collect the result
...     Iter.open(td + 'text.txt').filter(lambda line: 'def' in line).collect()
['def\n']

Note that this is a convenience method for reading from a file, not for writing. The function signature includes the mode parameter for parity with the builtin open() function, but only reading is supported.

🎤 @classmethod Iter.range(cls, *args) -> "Iter[int]"

The range function you all know and love.

>>> Iter.range(3).collect()
[0, 1, 2]
>>> Iter.range(0).collect()
[]

Iter.zip(self, *iterables: Any) -> "Iter[Tuple[T, ...]]"

The zip function you all know and love. The only thing to note here is that the first iterable is really what the Iter instance is wrapping. The Iter.zip invocation brings in the other iterables.

Make an Iter instance, then call zip on that.

>>> Iter('caleb').zip(range(10)).collect()
[('c', 0), ('a', 1), ('l', 2), ('e', 3), ('b', 4)]

Use a classmethod to get an infinite stream using Iter.count and zip against that with more finite iterators.

>>> Iter.count().zip(range(5), range(3, 100, 2)).collect()
[(0, 0, 3), (1, 1, 5), (2, 2, 7), (3, 3, 9), (4, 4, 11)]

It takes a few minutes to get used to that but feels comfortable pretty quickly.

Iter.take can be used to stop infinite zip sequences:

>>> Iter('caleb').cycle().enumerate().take(8).collect()
[(0, 'c'), (1, 'a'), (2, 'l'), (3, 'e'), (4, 'b'), (5, 'c'), (6, 'a'), (7, 'l')]

While we’re here (assuming you worked through the previous example), note the difference if you switch the order of the Iter.cycle and Iter.enumerate calls:

>>> Iter('caleb').enumerate().cycle().take(8).collect()
[(0, 'c'), (1, 'a'), (2, 'l'), (3, 'e'), (4, 'b'), (0, 'c'), (1, 'a'), (2, 'l')]

If you understand how this works, everything else in _excitertools_ will be intuitive to use.

🎧 Iter.any(self) -> "bool"

>>> Iter([0, 0, 0]).any()
False
>>> Iter([0, 0, 1]).any()
True
>>> Iter([]).any()
False

🎧 Iter.all(self) -> "bool"

>>> Iter([0, 0, 0]).all()
False
>>> Iter([0, 0, 1]).all()
False
>>> Iter([1, 1, 1]).all()
True

Now pay attention:

>>> Iter([]).all()
True

This behaviour has some controversy around it, but that’s how the all() builtin works so that’s what we do too. The way to think about what all() does is this: it returns False if there is at least one element that is falsy. Thus, if there are no elements it follows that there are no elements that are falsy and that’s why all([]) == True.

Iter.enumerate(self) -> "Iter[Tuple[int, T]]"

Yup, that enumerate.

>>> Iter('abc').enumerate().collect()
[(0, 'a'), (1, 'b'), (2, 'c')]
>>> Iter([]).enumerate().collect()
[]

Iter.dict(self) -> "Dict"

In Python a dict can be constructed through an iterable of tuples:

>>> dict([('a', 0), ('b', 1)])  # doctest: +SKIP
{'a': 0, 'b': 1}

In excitertools we prefer chaining so this method is a shortcut for that:

>>> d = Iter('abc').zip(count()).dict()
>>> assert d == {'a': 0, 'b': 1, 'c': 2}

Iter.map(self, func: Union[Callable[..., C], str]) -> "Iter[C]"

The map function you all know and love.

>>> Iter('abc').map(str.upper).collect()
['A', 'B', 'C']
>>> Iter(['abc', 'def']).map(str.upper).collect()
['ABC', 'DEF']

Using lambdas might seem convenient but in practice it turns out that they make code difficult to read:

>>> result = Iter('caleb').map(lambda x: (x, ord(x))).dict()
>>> assert result == {'a': 97, 'b': 98, 'c': 99, 'e': 101, 'l': 108}

It’s recommended that you make a separate function instead:

>>> def f(x):
...     return x, ord(x)
>>> result = Iter('caleb').map(f).dict()
>>> assert result == {'a': 97, 'b': 98, 'c': 99, 'e': 101, 'l': 108}

I know many people prefer anonymous functions (often on philosphical grouds) but in practice it’s just easier to make a separate, named function.

I’ve experimented with passing a string into the map, and using eval() to make a lambda internally. This simplifies the code very slightly, at the cost of using strings-as-code. I’m pretty sure this feature will be removed so don’t use it.

>>> result = Iter('caleb').map('x, ord(x)').dict()
>>> assert result == {'a': 97, 'b': 98, 'c': 99, 'e': 101, 'l': 108}

Iter.filter(self, function: "Optional[Callable[[T], bool]]" = None) -> "Iter[T]"

The map function you all know and love.

>>> Iter('caleb').filter(lambda x: x in 'aeiou').collect()
['a', 'e']

There is a slight difference between this method signature and the builtin filter: how the identity function is handled. This is a consquence of chaining. In the function signature above it is possible for us to give the function parameter a default value of None because the parameter appears towards the end of the parameter list. Last, in fact. In the builtin filter signature it doesn’t allow for this because the predicate parameter appears first.

This is a long way of saying: if you just want to filter out falsy values, no parameter is needed:

>>> Iter([0, 1, 0, 0, 0, 1, 1, 1, 0, 0]).filter().collect()
[1, 1, 1, 1]

Using the builtin, you’d have to do filter(None, iterable).

You’ll find that Iter.map and Iter.filter (and Iter.reduce, up next) work together very nicely:

>>> def not_eve(x):
...    return x != 'eve'
>>> Iter(['bob', 'eve', 'alice']).filter(not_eve).map(str.upper).collect()
['BOB', 'ALICE']

The long chains get unwieldy so let’s rewrite that:

>>> (
...     Iter(['bob', 'eve', 'alice'])
...         .filter(not_eve)
...         .map(str.upper)
...         .collect()
... )
['BOB', 'ALICE']

Iter.starfilter(self, function: "Optional[Callable[[T, ...], bool]]" = None) -> "Iter[T]"

Like Iter.filter, but arg unpacking in lambdas will work.

With the normal filter, this fails:

>>> Iter('caleb').enumerate().filter(lambda i, x: i > 2).collect()
Traceback (most recent call last):
    ...
TypeError: <lambda>() missing 1 required positional argument: 'x'

This is a real buzzkill. starfilter is very to starmap in that tuples are unpacked when calling the function:

>>> Iter('caleb').enumerate().starfilter(lambda i, x: i > 2).collect()
[(3, 'e'), (4, 'b')]

🎧 Iter.reduce(self, func: Callable[..., T], *args) -> "T"

The reduce function you all know and…hang on, actually reduce is rather unloved. In the past I've found it very complex to reason about, when looking at a bunch of nested function calls in typical ``itertools code. Hopefully iterable chaining makes it easier to read code that uses reduce?

Let’s check, does this make sense?

>>> payments = [
...     ('bob', 100),
...     ('alice', 50),
...     ('eve', -100),
...     ('bob', 19.95),
...     ('bob', -5.50),
...     ('eve', 11.95),
...     ('eve', 200),
...     ('alice', -45),
...     ('alice', -67),
...     ('bob', 1.99),
...     ('alice', 89),
... ]
>>> (
...     Iter(payments)
...         .filter(lambda entry: entry[0] == 'bob')
...         .map(lambda entry: entry[1])
...         .reduce(lambda total, value: total + value, 0)
... )
116.44

I intentionally omitted comments above so that you can try the “readability experiment”, but in practice you would definitely want to add some comments on these chains:

>>> (
...     # Iterate over all payments
...     Iter(payments)
...         # Only look at bob's payments
...         .filter(lambda entry: entry[0] == 'bob')
...         # Extract the value of the payment
...         .map(lambda entry: entry[1])
...         # Add all those payments together
...         .reduce(lambda total, value: total + value, 0)
... )
116.44

reduce is a quite crude low-level tool. In many cases you’ll find that there are other functions and methods better suited to the situations you’ll encounter most often. For example, it’s much easier to use Iter.groupby for grouping than to try to make that work with Iter.reduce. You can make it work but it’ll be easier to use Iter.groupby.

🎧 Iter.sum(self)

Exactly what you expect:

>>> Iter(range(10)).sum()
45

🎧 Iter.concat(self, glue: AnyStr) -> "AnyStr"

Joining strings.

>>> Iter(['hello', 'there']).concat(' ')
'hello there'
>>> Iter(['hello', 'there']).concat(',')
'hello,there'
>>> Iter([b'hello', b'there']).concat(b',')
b'hello,there'

Iter.insert(self, glue: C) -> "Iter[Union[C, T]]"

Docstring TBD

🎤 @classmethod Iter.count(cls, *args) -> "Iter[int]"

>>> Iter.count().take(3).collect()
[0, 1, 2]
>>> Iter.count(100).take(3).collect()
[100, 101, 102]
>>> Iter.count(100, 2).take(3).collect()
[100, 102, 104]

Iter.cycle(self) -> "Iter[T]"

>>> Iter('abc').cycle().take(8).collect()
['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b']
>>> Iter('abc').cycle().take(8).concat('')
'abcabcab'

🎤 ♾ @classmethod Iter.repeat(cls, elem: C, times=None) -> "Iter[C]"

Docstring TBD

Iter.accumulate(self, func=None, *, initial=None)

Docstring TBD

>>> Iter([1, 2, 3, 4, 5]).accumulate().collect()
[1, 3, 6, 10, 15]
>>> if sys.version_info >= (3, 8):
...     out = Iter([1, 2, 3, 4, 5]).accumulate(initial=100).collect()
...     assert out == [100, 101, 103, 106, 110, 115]
>>> Iter([1, 2, 3, 4, 5]).accumulate(operator.mul).collect()
[1, 2, 6, 24, 120]

Iter.chain(self, *iterables: Iterable[T]) -> "Iter[T]"

Docstring TBD

>>> Iter('ABC').chain('DEF').collect()
['A', 'B', 'C', 'D', 'E', 'F']
>>> Iter('ABC').chain().collect()
['A', 'B', 'C']

Iter.chain_from_iterable(self) -> "Iter[T]"

Docstring TBD

>>> Iter(['ABC', 'DEF']).chain_from_iterable().collect()
['A', 'B', 'C', 'D', 'E', 'F']

Iter.compress(self, selectors)

Replacement for the itertools compress function. This version returns an instance of Iter to allow further iterable chaining.

>>> Iter('ABCDEF').compress([1, 0, 1, 0, 1, 1]).collect()
['A', 'C', 'E', 'F']

Iter.dropwhile(self, pred)

Docstring TBD

Iter.filterfalse(self, pred)

Docstring TBD

Iter.groupby(self, key=None)

Docstring TBD

Iter.islice(self, *args) -> "Iter"

Docstring TBD

Iter.starmap(self, func)

Docstring TBD

Iter.takewhile(self, pred)

Docstring TBD

Iter.tee(self, n=2)

Docstring TBD

Iter.zip_longest(self, *iterables, fillvalue=None)

Docstring TBD

Iter.chunked(self, n: int) -> "Iter"

Docstring TBD

Iter.ichunked(self, n: int) -> "Iter"

Docstring TBD

@classmethod Iter.sliced(cls, seq: Sequence, n: int) -> "Iter"

Docstring TBD

Iter.distribute(self, n: int) -> "Iter"

Docstring TBD

Iter.divide(self, n: int) -> "Iter"

Docstring TBD

Iter.split_at(self, pred)

Docstring TBD

Iter.split_before(self, pred)

Docstring TBD

Iter.split_after(self, pred)

Docstring TBD

Iter.split_into(self, sizes)

Docstring TBD

Iter.split_when(self, pred)

Docstring TBD

Iter.bucket(self, key, validator=None)

Docstring TBD

Iter.unzip(self)

Docstring TBD

Iter.grouper(self, n: int, fillvalue=None) -> "Iter"

Docstring TBD

Iter.partition(self, pred) -> "Iter"

Docstring TBD

Iter.spy(self, n=1) -> "Tuple[Iter, Iter]"

Docstring TBD

Iter.peekable(self) -> "more_itertools.peekable"

Docstring TBD

>>> p = Iter(['a', 'b']).peekable()
>>> p.peek()
'a'
>>> next(p)
'a'

The peekable can be used to inspect what will be coming up. But if you then want to resume iterator chaining, pass the peekable back into an Iter instance.

>>> p = Iter(range(10)).peekable()
>>> p.peek()
0
>>> Iter(p).take(3).collect()
[0, 1, 2]

A peekable is not an Iter instance so it doesn’t provide the iterator chaining methods. But if you want to get into chaining, use the iter() method.

>>> p = Iter(range(5)).peekable()
>>> p.peek()
0
>>> p[1]
1
>>> p.iter().take(3).collect()
[0, 1, 2]

Peekables can be prepended. But then you usually want to go right back to iterator chaining. Thus, the prepend method (on the returned peekable instance) returns an Iter instance.

>>> p = Iter(range(3)).peekable()
>>> p.peek()
0
>>> p.prepend('a', 'b').take(4).collect()
['a', 'b', 0, 1]

Iter.seekable(self) -> "more_itertools.seekable"

Docstring TBD

Iter.windowed(self, n, fillvalue=None, step=1) -> "Iter"

Docstring TBD

Iter.substrings(self)

Docstring TBD

Iter.substrings_indexes(self, reverse=False)

Docstring TBD

Iter.stagger(self, offsets=(-1, 0, 1), longest=False, fillvalue=None)

>>> Iter([0, 1, 2, 3]).stagger().collect()
[(None, 0, 1), (0, 1, 2), (1, 2, 3)]
>>> Iter(range(8)).stagger(offsets=(0, 2, 4)).collect()
[(0, 2, 4), (1, 3, 5), (2, 4, 6), (3, 5, 7)]
>>> Iter([0, 1, 2, 3]).stagger(longest=True).collect()
[(None, 0, 1), (0, 1, 2), (1, 2, 3), (2, 3, None), (3, None, None)]

Iter.pairwise(self)

Reference more_itertools.pairwise

>>> Iter.count().pairwise().take(4).collect()
[(0, 1), (1, 2), (2, 3), (3, 4)]

Iter.count_cycle(self, n=None) -> "Iter"

Reference: more_itertools.count_cycle

>>> Iter('AB').count_cycle(3).collect()
[(0, 'A'), (0, 'B'), (1, 'A'), (1, 'B'), (2, 'A'), (2, 'B')]

Iter.intersperse(self, e, n=1) -> "Iter"

Reference: more_itertools.intersperse

>>> Iter([1, 2, 3, 4, 5]).intersperse('!').collect()
[1, '!', 2, '!', 3, '!', 4, '!', 5]

>>> Iter([1, 2, 3, 4, 5]).intersperse(None, n=2).collect()
[1, 2, None, 3, 4, None, 5]

Iter.padded(self, fillvalue: Optional[C] = None, n: Optional[int] = None, next_multiple: bool = False, ) -> "Iter[Union[T, C]]"

Reference: more_itertools.padded

>>> Iter([1, 2, 3]).padded('?', 5).collect()
[1, 2, 3, '?', '?']

>>> Iter([1, 2, 3, 4]).padded(n=3, next_multiple=True).collect()
[1, 2, 3, 4, None, None]

Iter.repeat_last(self, default=None) -> "Iter[T]"

Reference: more_itertools.repeat_last

>>> Iter(range(3)).repeat_last().islice(5).collect()
[0, 1, 2, 2, 2]

>>> Iter(range(0)).repeat_last(42).islice(5).collect()
[42, 42, 42, 42, 42]

Iter.adjacent(self, pred, distance=1) -> "Iter[Tuple[bool, T]]"

Reference: more_itertools.adjacent

>>> Iter(range(6)).adjacent(lambda x: x == 3).collect()
[(False, 0), (False, 1), (True, 2), (True, 3), (True, 4), (False, 5)]

>>> Iter(range(6)).adjacent(lambda x: x == 3, distance=2).collect()
[(False, 0), (True, 1), (True, 2), (True, 3), (True, 4), (True, 5)]

Iter.groupby_transform(self, keyfunc: Optional[Callable[..., K]] = None, valuefunc: Optional[Callable[..., V]] = None, ) -> "Iter[Tuple[K, Iterable[V]]]"

Reference: more_itertools.groupby_transform

This example has been modified somewhat from the original. We’re using starmap here to “unzip” the tuples produced by the group transform.

>>> iterable = 'AaaABbBCcA'
>>> keyfunc = lambda x: x.upper()
>>> valuefunc = lambda x: x.lower()
>>> (
...    Iter(iterable)
...        .groupby_transform(keyfunc, valuefunc)
...        .starmap(lambda k, g: (k, ''.join(g)))
...        .collect()
... )
[('A', 'aaaa'), ('B', 'bbb'), ('C', 'cc'), ('A', 'a')]

>>> from operator import itemgetter
>>> keys = [0, 0, 1, 1, 1, 2, 2, 2, 3]
>>> values = 'abcdefghi'
>>> iterable = zip(keys, values)
>>> (
...     Iter(iterable)
...        .groupby_transform(itemgetter(0), itemgetter(1))
...        .starmap(lambda k, g: (k, ''.join(g)))
...        .collect()
... )
[(0, 'ab'), (1, 'cde'), (2, 'fgh'), (3, 'i')]

Iter.padnone(self) -> "Iter[Union[T, None]]"

Reference: more_itertools.padnone

>>> Iter(range(3)).padnone().take(5).collect()
[0, 1, 2, None, None]

Iter.ncycles(self, n) -> "Iter[T]"

Reference: more_itertools.ncycles

>>> Iter(['a', 'b']).ncycles(3).collect()
['a', 'b', 'a', 'b', 'a', 'b']

Iter.collapse(self, base_type=None, levels=None) -> "Iter"

Reference: more_itertools.collapse

>>> iterable = [(1, 2), ([3, 4], [[5], [6]])]
>>> Iter(iterable).collapse().collect()
[1, 2, 3, 4, 5, 6]

>>> iterable = ['ab', ('cd', 'ef'), ['gh', 'ij']]
>>> Iter(iterable).collapse(base_type=tuple).collect()
['ab', ('cd', 'ef'), 'gh', 'ij']

>>> iterable = [('a', ['b']), ('c', ['d'])]
>>> Iter(iterable).collapse().collect() # Fully flattened
['a', 'b', 'c', 'd']
>>> Iter(iterable).collapse(levels=1).collect() # Only one level flattened
['a', ['b'], 'c', ['d']]

@class_or_instancemethod Iter.sort_together(self_or_cls, iterables, key_list=(0,), reverse=False)

Reference: more_itertools.sort_together

This can be called either as an instance method or a class method. The classmethod form is more convenient if all the iterables are already available. The instancemethod form is more convenient if one of the iterables already goes through some transformation.

Here are examples from the classmethod form, which mirror the examples in the more-itertools documentation:

>>> iterables = [(4, 3, 2, 1), ('a', 'b', 'c', 'd')]
>>> Iter.sort_together(iterables).collect()
[(1, 2, 3, 4), ('d', 'c', 'b', 'a')]

>>> iterables = [(3, 1, 2), (0, 1, 0), ('c', 'b', 'a')]
>>> Iter.sort_together(iterables, key_list=(1, 2)).collect()
[(2, 3, 1), (0, 0, 1), ('a', 'c', 'b')]

>>> Iter.sort_together([(1, 2, 3), ('c', 'b', 'a')], reverse=True).collect()
[(3, 2, 1), ('a', 'b', 'c')]

Here is an examples using the instancemethod form:

>>> iterables = [('a', 'b', 'c', 'd')]
>>> Iter([4, 3, 2, 1]).sort_together(iterables).collect()
[(1, 2, 3, 4), ('d', 'c', 'b', 'a')]

@class_or_instancemethod Iter.interleave(self_or_cls, *iterables) -> "Iter"

Reference: more_itertools.interleave

Classmethod form:

>>> Iter.interleave([1, 2, 3], [4, 5], [6, 7, 8]).collect()
[1, 4, 6, 2, 5, 7]

Instancemethod form:

>>> Iter([1, 2, 3]).interleave([4, 5], [6, 7, 8]).collect()
[1, 4, 6, 2, 5, 7]

@class_or_instancemethod Iter.interleave_longest(self_or_cls, *iterables) -> "Iter"

Reference: more_itertools.interleave_longest

Classmethod form:

>>> Iter.interleave_longest([1, 2, 3], [4, 5], [6, 7, 8]).collect()
[1, 4, 6, 2, 5, 7, 3, 8]

Instancemethod form:

>>> Iter([1, 2, 3]).interleave_longest([4, 5], [6, 7, 8]).collect()
[1, 4, 6, 2, 5, 7, 3, 8]

@classmethod Iter.zip_offset(cls, *iterables, offsets, longest=False, fillvalue=None) -> "Iter"

Reference: more_itertools.zip_offset

>>> Iter.zip_offset('0123', 'abcdef', offsets=(0, 1)).collect()
[('0', 'b'), ('1', 'c'), ('2', 'd'), ('3', 'e')]

>>> Iter.zip_offset('0123', 'abcdef', offsets=(0, 1), longest=True).collect()
[('0', 'b'), ('1', 'c'), ('2', 'd'), ('3', 'e'), (None, 'f')]

Iter.dotproduct(self, vec2: Iterable)

Reference: more_itertools.dotproduct

>>> Iter([10, 10]).dotproduct([20, 20])
400

Iter.flatten(self) -> "Iter[T]"

Reference: more_itertools.flatten

>>> Iter([[0, 1], [2, 3]]).flatten().collect()
[0, 1, 2, 3]

@class_or_instancemethod Iter.roundrobin(self_or_cls: Union[Type[T], T], *iterables: C) -> "Iter[Union[T, C]]"

Reference: more_itertools.roundrobin

Classmethod form:

>>> Iter.roundrobin('ABC', 'D', 'EF').collect()
['A', 'D', 'E', 'B', 'F', 'C']

Instancemethod form:

>>> Iter('ABC').roundrobin('D', 'EF').collect()
['A', 'D', 'E', 'B', 'F', 'C']

Iter.prepend(self, value: C) -> "Iter[Union[T, C]]"

Reference: more_itertools.prepend

>>> value = '0'
>>> iterator = ['1', '2', '3']
>>> Iter(iterator).prepend(value).collect()
['0', '1', '2', '3']

🎧 Iter.ilen(self) -> "int"

Reference: more_itertools.ilen

>>> Iter(x for x in range(1000000) if x % 3 == 0).ilen()
333334

Iter.unique_to_each(self) -> "Iter[T]"

Reference: more_itertools.unique_to_each

>>> Iter([{'A', 'B'}, {'B', 'C'}, {'B', 'D'}]).unique_to_each().collect()
[['A'], ['C'], ['D']]

>>> Iter(["mississippi", "missouri"]).unique_to_each().collect()
[['p', 'p'], ['o', 'u', 'r']]

Iter.sample(self, k=1, weights=None) -> "Iter"

Reference: more_itertools.sample

>>> iterable = range(100)
>>> Iter(iterable).sample(5).collect()  # doctest: +SKIP
[81, 60, 96, 16, 4]

>>> iterable = range(100)
>>> weights = (i * i + 1 for i in range(100))
>>> Iter(iterable).sample(5, weights=weights)  # doctest: +SKIP
[79, 67, 74, 66, 78]

>>> data = "abcdefgh"
>>> weights = range(1, len(data) + 1)
>>> Iter(data).sample(k=len(data), weights=weights)  # doctest: +SKIP
['c', 'a', 'b', 'e', 'g', 'd', 'h', 'f']


>>> # This one just to let the doctest run
>>> iterable = range(100)
>>> Iter(iterable).sample(5).map(lambda x: 0 <= x < 100).all()
True

Iter.consecutive_groups(self, ordering=lambda x: x)

Reference: more_itertools.consecutive_groups

>>> iterable = [1, 10, 11, 12, 20, 30, 31, 32, 33, 40]
>>> Iter(iterable).consecutive_groups().map(lambda g: list(g)).print('{v}').consume()
[1]
[10, 11, 12]
[20]
[30, 31, 32, 33]
[40]

Iter.run_length_encode(self) -> "Iter[Tuple[T, int]]"

Reference: more_itertools.run_length

>>> uncompressed = 'abbcccdddd'
>>> Iter(uncompressed).run_length_encode().collect()
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]

Iter.run_length_decode(self) -> "Iter"

Reference: more_itertools.run_length

>>> compressed = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> Iter(compressed).run_length_decode().collect()
['a', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd', 'd']

Iter.map_reduce(self, keyfunc, valuefunc=None, reducefunc=None) -> "Dict"

Reference: more_itertools.map_reduce

This interface mirrors what more-itertools does in that it returns a dict. See map_reduce_it() for a slightly-modified interface that returns the dict items as another iterator.

>>> keyfunc = lambda x: x.upper()
>>> d = Iter('abbccc').map_reduce(keyfunc)
>>> sorted(d.items())
[('A', ['a']), ('B', ['b', 'b']), ('C', ['c', 'c', 'c'])]

>>> keyfunc = lambda x: x.upper()
>>> valuefunc = lambda x: 1
>>> d = Iter('abbccc').map_reduce(keyfunc, valuefunc)
>>> sorted(d.items())
[('A', [1]), ('B', [1, 1]), ('C', [1, 1, 1])]

>>> keyfunc = lambda x: x.upper()
>>> valuefunc = lambda x: 1
>>> reducefunc = sum
>>> d = Iter('abbccc').map_reduce(keyfunc, valuefunc, reducefunc)
>>> sorted(d.items())
[('A', 1), ('B', 2), ('C', 3)]

Note the warning given in the more-itertools docs about how lists are created before the reduce step. This means you always want to filter before applying map_reduce, not after.

>>> all_items = _range(30)
>>> keyfunc = lambda x: x % 2  # Evens map to 0; odds to 1
>>> categories = Iter(all_items).filter(lambda x: 10<=x<=20).map_reduce(keyfunc=keyfunc)
>>> sorted(categories.items())
[(0, [10, 12, 14, 16, 18, 20]), (1, [11, 13, 15, 17, 19])]
>>> summaries = Iter(all_items).filter(lambda x: 10<=x<=20).map_reduce(keyfunc=keyfunc, reducefunc=sum)
>>> sorted(summaries.items())
[(0, 90), (1, 75)]

Iter.map_reduce_it(self, keyfunc: Callable[..., K], valuefunc: Optional[Callable[..., V]] = None, reducefunc: Optional[Callable[..., R]] = None, ) -> "Iter[Tuple[K, R]]"

Reference: more_itertools.map_reduce

>>> keyfunc = lambda x: x.upper()
>>> Iter('abbccc').map_reduce_it(keyfunc).collect()
[('A', ['a']), ('B', ['b', 'b']), ('C', ['c', 'c', 'c'])]

>>> keyfunc = lambda x: x.upper()
>>> valuefunc = lambda x: 1
>>> Iter('abbccc').map_reduce_it(keyfunc, valuefunc).collect()
[('A', [1]), ('B', [1, 1]), ('C', [1, 1, 1])]

>>> keyfunc = lambda x: x.upper()
>>> valuefunc = lambda x: 1
>>> reducefunc = sum
>>> Iter('abbccc').map_reduce_it(keyfunc, valuefunc, reducefunc).collect()
[('A', 1), ('B', 2), ('C', 3)]

🎧 Iter.exactly_n(self, n, predicate=bool) -> "bool"

Docstring TBD

>>> Iter([True, True, False]).exactly_n(2)
True

Iter.all_equal(self)

Iter.first_true(self)

Iter.quantify(self)

Iter.islice_extended(self, *args)

Reference: more_itertools.islice_extended

>>> Iter('abcdefgh').islice_extended(-4, -1).collect()
['e', 'f', 'g']
>>> Iter.count().islice_extended( 110, 99, -2).collect()
[110, 108, 106, 104, 102, 100]

Iter.first(self)

Reference: more_itertools.first

Iter.last(self)

Reference: more_itertools.last

Iter.one(self)

Reference: more_itertools.one

Iter.only(self, default=None, too_long=ValueError) -> "T"

Reference: more_itertools.one

>>> Iter([]).only(default='missing')
'missing'
>>> Iter([42]).only(default='missing')
42
>>> Iter([1, 2]).only()
Traceback (most recent call last):
    ...
ValueError: ...

Iter.strip(self, pred) -> "Iter[T]"

Reference: more_itertools.strip

>>> iterable = (None, False, None, 1, 2, None, 3, False, None)
>>> pred = lambda x: x in {None, False, ''}
>>> Iter(iterable).strip(pred).collect()
[1, 2, None, 3]

Iter.lstrip(self, pred) -> "Iter[T]"

Reference: more_itertools.lstrip

>>> iterable = (None, False, None, 1, 2, None, 3, False, None)
>>> pred = lambda x: x in {None, False, ''}
>>> Iter(iterable).lstrip(pred).collect()
[1, 2, None, 3, False, None]

Iter.rstrip(self, pred) -> "Iter[T]"

Reference: more_itertools.rstrip

>>> iterable = (None, False, None, 1, 2, None, 3, False, None)
>>> pred = lambda x: x in {None, False, ''}
>>> Iter(iterable).rstrip(pred).collect()
[None, False, None, 1, 2, None, 3]

Iter.filter_except(self, validator, *exceptions) -> "Iter[T]"

Reference: more_itertools.filter_except

>>> iterable = ['1', '2', 'three', '4', None]
>>> Iter(iterable).filter_except(int, ValueError, TypeError).collect()
['1', '2', '4']

Iter.map_except(self, function, *exceptions) -> "Iter"

Reference: more_itertools.map_except

>>> iterable = ['1', '2', 'three', '4', None]
>>> Iter(iterable).map_except(int, ValueError, TypeError).collect()
[1, 2, 4]

Iter.nth_or_last(self, n, default=_marker) -> "T"

Reference: more_itertools.nth_or_last

>>> Iter([0, 1, 2, 3]).nth_or_last(2)
2
>>> Iter([0, 1]).nth_or_last(2)
1
>>> Iter([]).nth_or_last(0, 'some default')
'some default'

Iter.nth(self, n, default=None)

Reference: more_itertools.nth

Iter.take(self, n: int) -> "Iter"

Reference: more_itertools.take

Iter.tail(self, n) -> "Iter[T]"

Reference: more_itertools.tail

>>> Iter('ABCDEFG').tail(3).collect()
['E', 'F', 'G']

Iter.unique_everseen(self, key=None) -> "Iter[T]"

Reference: more_itertools.unique_everseen

>>> Iter('AAAABBBCCDAABBB').unique_everseen().collect()
['A', 'B', 'C', 'D']
>>> Iter('ABBCcAD').unique_everseen(key=str.lower).collect()
['A', 'B', 'C', 'D']

Be sure to read the more-itertools docs whne using unhashable items.

>>> iterable = ([1, 2], [2, 3], [1, 2])
>>> Iter(iterable).unique_everseen().collect()  # Slow
[[1, 2], [2, 3]]
>>> Iter(iterable).unique_everseen(key=tuple).collect()  # Faster
[[1, 2], [2, 3]]

Iter.unique_justseen(self, key=None) -> "Iter[T]"

Reference: more_itertools.unique_justseen

>>> Iter('AAAABBBCCDAABBB').unique_justseen().collect()
['A', 'B', 'C', 'D', 'A', 'B']
>>> Iter('ABBCcAD').unique_justseen(key=str.lower).collect()
['A', 'B', 'C', 'A', 'D']

Iter.distinct_permutations(self)

Reference: more_itertools.distinct_permutations

>>> Iter([1, 0, 1]).distinct_permutations().sorted().collect()
[(0, 1, 1), (1, 0, 1), (1, 1, 0)]

Iter.distinct_combinations(self, r) -> "Iter[T]"

Reference: more_itertools.distinct_combinations

>>> Iter([0, 0, 1]).distinct_combinations(2).collect()
[(0, 0), (0, 1)]

Iter.circular_shifts(self) -> "Iter[T]"

Reference: more_itertools.circular_shifts

>>> Iter(range(4)).circular_shifts().collect()
[(0, 1, 2, 3), (1, 2, 3, 0), (2, 3, 0, 1), (3, 0, 1, 2)]

Iter.partitions(self) -> "Iter[T]"

Reference: more_itertools.partitions

>>> Iter('abc').partitions().collect()
[[['a', 'b', 'c']], [['a'], ['b', 'c']], [['a', 'b'], ['c']], [['a'], ['b'], ['c']]]
>>> Iter('abc').partitions().print('{v}').consume()
[['a', 'b', 'c']]
[['a'], ['b', 'c']]
[['a', 'b'], ['c']]
[['a'], ['b'], ['c']]
>>> Iter('abc').partitions().map(lambda v: [''.join(p) for p in v]).print('{v}').consume()
['abc']
['a', 'bc']
['ab', 'c']
['a', 'b', 'c']

Iter.set_partitions(self, k=None) -> "Iter[T]"

Reference: more_itertools.set_partitions

>>> Iter('abc').set_partitions(2).collect()
[[['a'], ['b', 'c']], [['a', 'b'], ['c']], [['b'], ['a', 'c']]]

Iter.powerset(self)

Reference: more_itertools.powerset

>>> Iter([1, 2, 3]).powerset().collect()
[(), (1,), (2,), (3,), (1, 2), (1, 3), (2, 3), (1, 2, 3)]

@class_or_instancemethod Iter.random_product(self_or_cls, *args, repeat=1)

Reference: more_itertools.random_product

>>> Iter('abc').random_product(range(4), 'XYZ').collect()  # doctest: +SKIP
['c', 3, 'X']
>>> Iter.random_product('abc', range(4), 'XYZ').collect()  # doctest: +SKIP
['c', 0, 'Z']
>>> Iter('abc').random_product(range(0)).collect()
Traceback (most recent call last):
    ...
IndexError: Cannot choose from an empty sequence
>>> Iter.random_product(range(0)).collect()
Traceback (most recent call last):
    ...
IndexError: Cannot choose from an empty sequence

Iter.random_permutation(self, r=None)

Reference: more_itertools.random_permutation

>>> Iter(range(5)).random_permutation().collect()  # doctest: +SKIP
[2, 0, 4, 3, 1]
>>> Iter(range(0)).random_permutation().collect()
[]

Iter.random_combination(self, r)

Reference: more_itertools.random_combination

>>> Iter(range(5)).random_combination(3).collect()  # doctest: +SKIP
[0, 1, 4]
>>> Iter(range(5)).random_combination(0).collect()
[]

Iter.random_combination_with_replacement(self, r)

Reference: more_itertools.random_combination_with_replacement

>>> Iter(range(3)).random_combination_with_replacement(5).collect()  # doctest: +SKIP
[0, 0, 1, 2, 2]
>>> Iter(range(3)).random_combination_with_replacement(0).collect()
[]

Iter.nth_combination(self, r, index)

Reference: more_itertools.nth_combination

>>> Iter(range(9)).nth_combination(3, 1).collect()
[0, 1, 3]
>>> Iter(range(9)).nth_combination(3, 2).collect()
[0, 1, 4]
>>> Iter(range(9)).nth_combination(3, 3).collect()
[0, 1, 5]
>>> Iter(range(9)).nth_combination(4, 3).collect()
[0, 1, 2, 6]
>>> Iter(range(9)).nth_combination(3, 7).collect()
[0, 2, 3]

@classmethod Iter.always_iterable(cls, obj, base_type=(str, bytes)) -> 'Iter'

Reference: more_itertools.always_iterable

>>> Iter.always_iterable([1, 2, 3]).collect()
[1, 2, 3]
>>> Iter.always_iterable(1).collect()
[1]
>>> Iter.always_iterable(None).collect()
[]
>>> Iter.always_iterable('foo').collect()
['foo']
>>> Iter.always_iterable(dict(a=1), base_type=dict).collect()
[{'a': 1}]

Iter.always_reversible(self)

Reference: more_itertools.always_reversible

This is like reversed() but it also operates on things that wouldn’t normally be reversible, like generators. It does this with internal caching, so be careful with memory use.

>>> Iter('abc').always_reversible().collect()
['c', 'b', 'a']
>>> Iter(x for x in 'abc').always_reversible().collect()
['c', 'b', 'a']

@classmethod Iter.with_iter(cls, context_manager)

Reference: more_itertools.with_iter

Note: Any context manager which returns an iterable is a candidate for Iter.with_iter.

>>> import tempfile
>>> with tempfile.TemporaryDirectory() as td:
...     with open(td + 'text.txt', 'w') as f:
...         f.writelines(['abc\n', 'def\n', 'ghi\n'])
...     Iter.with_iter(open(td + 'text.txt')).map(lambda x: x.upper()).collect()
['ABC\n', 'DEF\n', 'GHI\n']

See also: Iter.open

🛠 TODO: perhaps we should get rid of Iter.open and just use this?

@classmethod Iter.iter_except(cls, func, exception, first=None) -> "Iter"

Reference: more_itertools.iter_except

>>> l = [0, 1, 2]
>>> Iter.iter_except(l.pop, IndexError).collect()
[2, 1, 0]

Iter.locate(self, pred=bool, window_size=None) -> "Iter"

Reference: more_itertools.locate

>>> Iter([0, 1, 1, 0, 1, 0, 0]).locate().collect()
[1, 2, 4]
>>> Iter(['a', 'b', 'c', 'b']).locate(lambda x: x == 'b').collect()
[1, 3]
>>> iterable = [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]
>>> pred = lambda *args: args == (1, 2, 3)
>>> Iter(iterable).locate(pred=pred, window_size=3).collect()
[1, 5, 9]
>>> from itertools import count
>>> from more_itertools import seekable
>>> source = (3 * n + 1 if (n % 2) else n // 2 for n in count())
>>> it = Iter(source).seekable()
>>> pred = lambda x: x > 100
>>> # TODO: can we avoid making two instances?
>>> indexes = Iter(it).locate(pred=pred)
>>> i = next(indexes)
>>> it.seek(i)
>>> next(it)
106

Iter.rlocate(self, pred=bool, window_size=None) -> "Iter"

Reference: more_itertools.rlocate

>>> Iter([0, 1, 1, 0, 1, 0, 0]).rlocate().collect()  # Truthy at 1, 2, and 4
[4, 2, 1]
>>> pred = lambda x: x == 'b'
>>> Iter('abcb').rlocate(pred).collect()
[3, 1]
>>> iterable = [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3]
>>> pred = lambda *args: args == (1, 2, 3)
>>> Iter(iterable).rlocate(pred=pred, window_size=3).collect()
[9, 5, 1]

Iter.replace(self, pred, substitutes, count=None, window_size=1) -> "Iter"

Reference: more_itertools.replace

>>> iterable = [1, 1, 0, 1, 1, 0, 1, 1]
>>> pred = lambda x: x == 0
>>> substitutes = (2, 3)
>>> Iter(iterable).replace(pred, substitutes).collect()
[1, 1, 2, 3, 1, 1, 2, 3, 1, 1]
>>> iterable = [1, 1, 0, 1, 1, 0, 1, 1, 0]
>>> pred = lambda x: x == 0
>>> substitutes = [None]
>>> Iter(iterable).replace(pred, substitutes, count=2).collect()
[1, 1, None, 1, 1, None, 1, 1, 0]
>>> iterable = [0, 1, 2, 5, 0, 1, 2, 5]
>>> window_size = 3
>>> pred = lambda *args: args == (0, 1, 2)  # 3 items passed to pred
>>> substitutes = [3, 4] # Splice in these items
>>> Iter(iterable).replace(
...     pred, substitutes, window_size=window_size
... ).collect()
[3, 4, 5, 3, 4, 5]

@classmethod Iter.numeric_range(cls, *args) -> "Iter"

Reference: more_itertools.numeric_range

>>> Iter.numeric_range(3.5).collect()
[0.0, 1.0, 2.0, 3.0]
>>> from decimal import Decimal
>>> start = Decimal('2.1')
>>> stop = Decimal('5.1')
>>> Iter.numeric_range(start, stop).collect()
[Decimal('2.1'), Decimal('3.1'), Decimal('4.1')]
>>> from fractions import Fraction
>>> start = Fraction(1, 2)  # Start at 1/2
>>> stop = Fraction(5, 2)  # End at 5/2
>>> step = Fraction(1, 2)  # Count by 1/2
>>> Iter.numeric_range(start, stop, step).collect()
[Fraction(1, 2), Fraction(1, 1), Fraction(3, 2), Fraction(2, 1)]
>>> Iter.numeric_range(3, -1, -1.0).collect()
[3.0, 2.0, 1.0, 0.0]

Iter.side_effect(self, func, chunk_size=None, before=None, after=None)

Reference: more_itertools.side_effect

>>> def f(item):
...     if item == 3:
...         raise Exception('got 3')
>>> Iter.range(5).side_effect(f).consume()
Traceback (most recent call last):
    ...
Exception: got 3
>>> func = lambda item: print('Received {}'.format(item))
>>> Iter.range(2).side_effect(func).consume()
Received 0
Received 1

Iter.iterate(self)

Iter.difference(self, func=operator.sub, *, initial=None)

Reference: more_itertools.difference

>>> iterable = [0, 1, 3, 6, 10]
>>> Iter(iterable).difference().collect()
[0, 1, 2, 3, 4]
>>> iterable = [1, 2, 6, 24, 120]  # Factorial sequence
>>> func = lambda x, y: x // y
>>> Iter(iterable).difference(func).collect()
[1, 2, 3, 4, 5]

Iter.make_decorator(self)

Iter.SequenceView(self)

Iter.time_limited(self, limit_seconds) -> "Iter"

Reference: more_itertools.time_limited

>>> from time import sleep
>>> def generator():
...     yield 1
...     yield 2
...     sleep(0.2)
...     yield 3
>>> Iter(generator()).time_limited(0.1).collect()
[1, 2]

🎧 Iter.consume(self, n: Optional[int] = None) -> "Optional[Iter[T]]"

If n is not provided, the entire iterator is consumed and None is returned. Otherwise, an iterator will always be returned, even if n is greater than the number of items left in the iterator.

In this example, the source has more elements than what we consume, so there will still be data available on the chain:

>>> range(10).consume(5).collect()
[5, 6, 7, 8, 9]

We can bump up the count of how many items can be consumed. Note that even though n is greater than the number of items in the source, it is still required to call Iter.collect to consume the remaining items.

>>> range(10).consume(50).collect()
[]

Finally, if n is not provided, the entire stream is consumed. In this scenario, Iter.collect would fail since nothing is being returned from the consume call.

>>> assert range(10).consume() is None

Iter.tabulate(self)

🎤 @classmethod Iter.repeatfunc(cls, func, *args, times=None)

Docstring TBD

>>> Iter.repeatfunc(operator.add, 3, 5, times=4).collect()
[8, 8, 8, 8]

Iter.wrap(self, ends: "Sequence[T, T]" = "()")

Other examples for ends: ‘”’ * 2, or ‘`’ * 2, or ‘[]’ etc.

Iter.print(self, template="{i}: {v}") -> "Iter[T]"

Printing during the execution of an iterator. Mostly useful for debugging. Returns another iterator instance through which the original data is passed unchanged. This means you can include a print() step as necessary to observe data during iteration.

>>> Iter('abc').print().collect()
0: a
1: b
2: c
['a', 'b', 'c']

>>> (
...    Iter(range(5))
...        .print('before filter {i}: {v}')
...        .filter(lambda x: x > 2)
...        .print('after filter {i}: {v}')
...        .collect()
... )
before filter 0: 0
before filter 1: 1
before filter 2: 2
before filter 3: 3
after filter 0: 3
before filter 4: 4
after filter 1: 4
[3, 4]

🎤 @classmethod Iter.from_queue(cls, q: queue.Queue, timeout=None, sentinel=None)

Wrap a queue with an iterator interface. This allows it to participate in chaining operations. The iterator will block while waiting for new values to appear on the queue. This is useful: it allows you to easily and safely pass data between threads or processes, and feed the incoming data into a pipeline.

The sentinel value, default None, will terminate the iterator.

>>> q = queue.Queue()
>>> # This line puts stuff onto a queue
>>> range(10).chain([None]).map(q.put).consume()
>>> # This is where we consume data from the queue:
>>> Iter.from_queue(q).filter(lambda x: 2 < x < 9).collect()
[3, 4, 5, 6, 7, 8]

If None had not been chained onto the data, the iterator would have waited in Iter.collect forever.

🎧 Iter.into_queue(self, q: queue.Queue)

This is a sink, like Iter.collect, that consumes data from an iterator chain and puts the data into the given queue.

>>> q = queue.Queue()
>>> # This demonstrates the queue sink
>>> range(5).into_queue(q)
>>> # Code below is only for verification
>>> out = []
>>> finished = False
>>> while not finished:
...     try:
...         out.append(q.get_nowait())
...     except queue.Empty:
...         finished = True
>>> out
[0, 1, 2, 3, 4]

🎧 Iter.send(self, collector: Generator, close_collector_when_done=False) -> "None"

See also: more_itertools.consumer

Send data into a generator. You do not have to first call next() on the generator. Iter.send will do this for you.

⚠ Look carefully at the examples below; you’ll see that the yield keyword is wrapped in a second set of parens, e.g. output.append((yield)). This is required!

Simple case:

>>> output = []
>>> def collector():
...     while True:
...         output.append((yield))
>>> Iter.range(3).send(collector())
>>> output
[0, 1, 2]

Note that the generator is not closed by default after the iterable is exhausted. But this can be changed. If you choose to close the generator, use the parameter:

>>> output = []
>>> def collector():
...     while True:
...         output.append((yield))
>>> g = collector()
>>> Iter.range(3).send(g, close_collector_when_done=True)
>>> Iter.range(3).send(g)
Traceback (most recent call last):
    ...
StopIteration

The default behaviour is that the generator is left open which means you can keep using it for other iterators:

>>> output = []
>>> def collector():
...     while True:
...         output.append((yield))
>>> g = collector()
>>> Iter.range(3).send(g)
>>> Iter.range(10, 13).send(g)
>>> Iter.range(100, 103).send(g)
>>> output
[0, 1, 2, 10, 11, 12, 100, 101, 102]

If the generator is closed before the iteration is complete, you’ll get a StopIteration exception:

>>> output = []
>>> def collector():
...   for i in range(3):
...       output.append((yield))
>>> Iter.range(5).send(collector())
Traceback (most recent call last):
    ...
StopIteration

Note that Iter.send is a sink, so no further chaining is allowed.

Iter.send_also(self, collector: Generator) -> "Iter"

Reference: more_itertools.consumer

Some ideas around a reverse iterator as a sink. The requirement to first “next” a just-started generator before you can send values into it is irritating, but not insurmountable. This method will automatically detect the “just-started generator” situation, do the next(), and then send in the first value as necessary.

Simple case:

>>> output = []
>>> def collector():
...     while True:
...         output.append((yield))
>>> Iter.range(3).send_also(collector()).collect()
[0, 1, 2]
>>> output
[0, 1, 2]

If the generator is closed before the iteration is complete, you’ll get an exception (Python 3.7+):

>>> output = []
>>> def collector():
...   for i in builtins.range(3):
...       output.append((yield))
>>> Iter.range(50).send_also(collector()).collect()  # doctest: +SKIP
Traceback (most recent call last):
    ...
RuntimeError

Note that the above doesn’t happen in Python < 3.7 (which includes pypy 7.3.1 that matches Python 3.6.9 compatibility). Instead, you collect out the items up to until the point that the collector returns; in this case, you’d get [0, 1, 2]. This change was made as part of PEP 479.

Regardless, for any Python it’s recommended that your generator live at least as long as the iterator feeding it.

🎧 ⚠ Iter.sorted(self, key=None, reverse=False) -> "Iter[T]"

Simple wrapper for the sorted builtin.

Calling this will read the entire stream before producing results.

>>> Iter('bac').sorted().collect()
['a', 'b', 'c']
>>> Iter('bac').sorted(reverse=True).collect()
['c', 'b', 'a']
>>> Iter('bac').zip([2, 1, 0]).sorted(key=lambda tup: tup[1]).collect()
[('c', 0), ('a', 1), ('b', 2)]

🎧 ⚠ Iter.reversed(self) -> "Iter[T]"

Simple wrapper for the reversed builtin.

Calling this will read the entire stream before producing results.

>>> Iter(range(4)).reversed().collect()
[3, 2, 1, 0]

🛠 class IterDict(UserDict)

The idea here was to make a custom dict where several of the standard dict methods return Iter instances, which can then be chained. I’m not sure if this will be kept yet.

IterDict.keys(self) -> "Iter"

IterDict.values(self) -> "Iter"

IterDict.items(self) -> "Iter"

IterDict.update(self, *args, **kwargs) -> "IterDict"

insert_separator(iterable: Iterable[Any], glue: Any) -> "Iterable[Any]"

Similar functionality can be obtained with, e.g., interleave, as in

>>> result = Iter('caleb').interleave(Iter.repeat('x')).collect()
>>> result == list('cxaxlxexbx')
True

But you’ll see a trailing “x” there, which join avoids. join makes sure to only add the glue separator if another element has arrived.

It can handle strings without any special considerations, but it doesn’t do any special handling for bytes and bytearrays. For that, rather look at concat().

concat(iterable: Iterable[AnyStr], glue: AnyStr) -> "AnyStr"

Concatenate strings, bytes and bytearrays. It is careful to avoid the problem with single bytes becoming integers, and it looks at the value of glue to know whether to handle bytes or strings.

This function can raise ValueError if called with something other than bytes, bytearray or str.

🎤 from_queue(q: queue.Queue, timeout=None, sentinel=None) -> "Iter"

Wrap a queue with an iterator interface. This allows it to participate in chaining operations. The iterator will block while waiting for new values to appear on the queue. This is useful: it allows you to easily and safely pass data between threads or processes, and feed the incoming data into a pipeline.

The sentinel value, default None, will terminate the iterator.

>>> q = queue.Queue()
>>> # This line puts stuff onto a queue
>>> range(10).chain([None]).map(q.put).consume()
>>> from_queue(q).filter(lambda x: 2 < x < 9).collect()
[3, 4, 5, 6, 7, 8]

Dev Instructions

Setup

$ python -m venv venv
$ source venv/bin/activate
(venv) $ pip install -e .[dev,test]

Testing

(venv) $ pytest --cov

Documentation

To regenerate the documentation, file README.rst:

(venv) $ python regenerate_readme.py -m excitertools.py > README.rst

Releasing

To do a release, we’re using bumpymcbumpface. Make sure that is set up correctly according to its own documentation. I like to use pipx to install and manage these kinds of tools.

$ bumpymcbumpface --push-git --push-pypi





Work is a necessary evil to be avoided. Mark Twain

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

excitertools-0.4.0.tar.gz (67.6 kB view hashes)

Uploaded Source

Built Distribution

excitertools-0.4.0-py3-none-any.whl (26.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page