Useful utilities.

These details have not been verified by PyPI

Project links

Project description

🪨 carabiner

GitHub Workflow Status (with branch) PyPI - Python Version PyPI

A ragtag collection of useful Python functions and classes.

Installation
Fast and flexible reading and random access of very large files
- Reading tabular data
Utilities to simplify building command-line apps
Reservoir sampling
Multikey dictionaries
Decorators
Colorblind palette
Grids with sensible defaults in Matplotlib
Fast indicator matrix x dense matrix multiplication in Tensorflow
Issues, problems, suggestions
Documentation

Installation

The easy way

Install the pre-compiled version from GitHub:

$ pip install carabiner-tools

If you want to use the tensorflow, pandas, or matplotlib utilities, these must be installed separately or together:

$ pip install carabiner-tools[deep]
# or
$ pip install carabiner-tools[pd]
# or
$ pip install carabiner-tools[mpl]
# or
$ pip install carabiner-tools[all]

From source

Clone the repository, then cd into it. Then run:

$ pip install -e .

Fast and flexible reading and random access of very large files

Subsets of lines from very large, optionally compressed, files can be read quickly into memory. for example, we can read the first 10,000 lines of an arbitrarily large file:

>>> from carabiner.io import get_lines

>>> get_lines("big-table.tsv.gz", lines=10_000)

Or random access of specific lines. Hundreds of millions of lines can be parsed per minute.

>>> get_lines("big-table.tsv.gz", lines=[999999, 10000000, 100000001])

This pattern will allow sampling a random subset:

>>> from random import sample
>>> from carabiner.io import count_lines, get_lines

>>> number_of_lines = count_lines("big-table.tsv.gz")
>>> line_sample = sample(range(number_of_lines), k=1000)
>>> get_lines("big-table.tsv.gz", lines=line_sample)

Reading tabular data

With this backend, we can read subsets of very large files more quickly and flexibly than plain pandas.read_csv. Formats (delimiters) including Excel are inferred from file extensions, but can also be over-ridden with the format parameter.

>>> from carabiner.pd import read_table

>>> read_table("big-table.tsv.gz", lines=10_000)

The same fast random access is availavble as for reading lines. Hundreds of millions of records can be looped through per minute.

>>> from random import sample
>>> from carabiner.io import count_lines, get_lines

>>> number_of_lines = count_lines("big-table.tsv.gz")
>>> line_sample = sample(range(number_of_lines), k=1000)
>>> read_table("big-table.tsv.gz", lines=line_sample)

Utilities to simplify building command-line apps

The standard library argparse is robust but verbose when building command-line apps with several sub-commands, each with many options. carabiner.cliutils smooths this process. Apps are built by defining CLIOptions which are then assigned to CLICommands directing the functions to run when called, which then form part of a CLIApp.

First define the options:

inputs = CLIOption('inputs',
                    type=str,
                    default=[],
                    nargs='*',
                    help='')
output = CLIOption('--output', '-o', 
                    type=FileType('w'),
                    default=sys.stdout,
                    help='Output file. Default: STDOUT')
formatting = CLIOption('--format', '-f', 
                        type=str,
                        default='TSV',
                        choices=['TSV', 'CSV', 'tsv', 'csv'],
                        help='Format of files. Default: %(default)s')

Then the commands:

test = CLICommand("test",
                    description="Test CLI subcommand using Carabiner utilities.",
                    options=[inputs, output, formatting],
                    main=_main)

The same options can be assigned to multiple commands if necessary.

Fianlly, define the app and run it:

app = CLIApp("Carabiner", 
             version=__version__,
             description="Test CLI app using Carabiner utilities.",
             commands=[test])

app.run()

Reservoir sampling

If you need to sample a random subset from an iterator of unknown length by looping through only once, you can use this pure python implementation of reservoir sampling.

An important limitation is that while the population to be sampled is not necessarily in memory, the sampled population must fit in memory.

Originally written in Python Bugs.

Based on this GitHub Gist.

>>> from carabiner.random import sample_iter
>>> from string import ascii_letters
>>> from itertools import chain
>>> from random import seed
>>> seed(1)
>>> sample_iter(chain.from_iterable(ascii_letters for _ in range(1000000)), 10)
['X', 'c', 'w', 'q', 'T', 'e', 'u', 'w', 'E', 'h']
>>> seed(1)
>>> sample_iter(chain.from_iterable(ascii_letters for _ in range(1000000)), 10, shuffle_output=False)
['T', 'h', 'u', 'X', 'E', 'e', 'w', 'q', 'c', 'w']

Multikey dictionaries

Conveniently return the values of multiple keys from a dictionary without manually looping.

>>> from carabiner.collections import MultiKeyDict
>>> d = MultiKeyDict(a=1, b=2, c=3)
>>> d
{'a': 1, 'b': 2, 'c': 3}
>>> d['c']
{'c': 3}
>>> d['a', 'b']
{'a': 1, 'b': 2}

Decorators

carabiner provides several decorators to facilitate functional programming.

Vectorized functions

In scientific programming frameworks like numpy we are used to functions which take a scalar or vector and apply to every element. It is occasionally useful to convert functions from arbitrary packages to behave in a vectorized manner on Python iterables.

Scalar functions can be converted to a vectorized form easily using @vectorize.

>>> @vectorize
... def vector_adder(x): return x + 1
...
>>> list(vector_adder(range(3)))
[1, 2, 3]
>>> list(vector_adder((4, 5, 6)))
[5, 6, 7]
>>> vector_adder([10])
11
>>> vector_adder(10)
11

Return `None` instead of error

When it is useful for a function to not fail, but have a testable indicator of success, you can wrap in @return_none_on_error.

>>> def error_maker(x): raise KeyError
... 
>>> @return_none_on_error
... def error_maker2(x): raise KeyError
... 
>>> @return_none_on_error(exception=ValueError)
... def error_maker3(x): raise KeyError
... 

>>> error_maker('a')  # Causes an error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in error_maker
KeyError

>>> error_maker2('a')  # Wrapped returns None

>>> error_maker3('a')  # Only catches ValueError
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../carabiner/decorators.py", line 59, in wrapped_function
    
File "<stdin>", line 2, in error_maker3
KeyError

Decorators with parameters

Sometimes a decorator has optional parameters to control its behavior. It's convenient to use it in the form @decorator when you want the default behavior, or @decorator(*kwargs) when you want to custmize the behavior. Usually this requires some convoluted code, but this has been packed up into @decorator_with_params, to decorate your decorator definitions!

>>> def decor(f, suffix="World"): 
...     return lambda x: f(x + suffix)
...
>>> @decor
... def printer(x): 
...     print(x)
... 

# doesn't work, raises an error!
>>> @decor(suffix="everyone")  
... def printer2(x): 
...     print(x)
... 
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: decor() missing 1 required positional argument: 'f'

# decorate the decorator!
>>> @decorator_with_params
... def decor2(f, suffix="World"): 
...     return lambda x: f(x + suffix)
... 

# Now it works!
>>> @decor2(suffix="everyone")  
... def printer3(x): 
...     print(x)
... 

>>> printer("Hello ")
Hello World
>>> printer3("Hello ")
Hello everyone

Colorblind palette

Here's a qualitative palette that's colorblind friendly.

>>> from carabiner import colorblind_palette

>>> colorblind_palette()
('#EE7733', '#0077BB', '#33BBEE', '#EE3377', '#CC3311', '#009988', '#BBBBBB', '#000000')

# subsets
>>> colorblind_palette(range(2))
('#EE7733', '#0077BB')
>>> colorblind_palette(slice(3, 6))
('#EE3377', '#CC3311', '#009988')

Grids with sensible defaults in Matplotlib

While plt.subplots() is very flexible, it requires many defaults to be defined. Instead, carabiner.mpl.grid() generates the fig, ax tuple with sensible defaults of a 1x1 grid with panel size 3 and a constrained layout.

from carabiner.mpl import grid
fig, ax = grid()  # 1x1 grid
fig, ax = grid(ncol=3)  # 1x3 grid; figsize expands appropriately
fig, ax = grid(ncol=3, nrow=2, sharex=True)  #additional parameters are passed to `plt.subplots()`

Fast indicator matrix x dense matrix multiplication in Tensorflow

If you want to multiply an indicator matrix, i.e. a sparse matrix of zeros and ones with the same number of non-zero entries per row (as in linear models), as part of a Tensorflow model, this pattern will be faster than using tensorflow.SparseMatrix if you convert the indicator matrix to a [n x 1] matrix providing the index of the non-zero element per row.

Issues, problems, suggestions

Add to the issue tracker.

Documentation

Available at ReadTheDocs.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.5.post3

Apr 10, 2026

0.0.5.post2

Apr 10, 2026

0.0.5.post1

Apr 10, 2026

0.0.5

Apr 10, 2026

This version

0.0.4

Mar 1, 2025

0.0.3.post1

Oct 5, 2024

0.0.3

Oct 5, 2024

0.0.2

Apr 23, 2024

0.0.1.post2

Mar 11, 2024

0.0.1.post1

Mar 11, 2024

0.0.1

Feb 22, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

carabiner_tools-0.0.4.tar.gz (26.4 kB view details)

Uploaded Mar 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

carabiner_tools-0.0.4-py3-none-any.whl (27.1 kB view details)

Uploaded Mar 1, 2025 Python 3

File details

Details for the file carabiner_tools-0.0.4.tar.gz.

File metadata

Download URL: carabiner_tools-0.0.4.tar.gz
Upload date: Mar 1, 2025
Size: 26.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for carabiner_tools-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`7371480269a5d6f143c5feb17bd8674598b76d80e48ff510f8b49febaf01fd97`
MD5	`207aa2282cd46e09e03f440b923f090a`
BLAKE2b-256	`c57a0d660dd8e70134f4c06c8e29273ffcf329657c46df4dce153072d3e9709e`

See more details on using hashes here.

File details

Details for the file carabiner_tools-0.0.4-py3-none-any.whl.

File metadata

Download URL: carabiner_tools-0.0.4-py3-none-any.whl
Upload date: Mar 1, 2025
Size: 27.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for carabiner_tools-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3d054d0534a6fcfd127545fda27984a638963e15d58e096c8b80d75a4e22ffd6`
MD5	`c3578eb59581aabe3dcdc169a0eb0ced`
BLAKE2b-256	`20db4de856e00c0f2726d5e686d74902f1a5cfde96115525c565c65798943df2`

See more details on using hashes here.

carabiner-tools 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🪨 carabiner

Installation

The easy way

From source

Fast and flexible reading and random access of very large files

Reading tabular data

Utilities to simplify building command-line apps

Reservoir sampling

Multikey dictionaries

Decorators

Vectorized functions

Return None instead of error

Decorators with parameters

Colorblind palette

Grids with sensible defaults in Matplotlib

Fast indicator matrix x dense matrix multiplication in Tensorflow

Issues, problems, suggestions

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Return `None` instead of error