Simpler parallelization.

These details have not been verified by PyPI

Project links

Project description

PyPI PyPI - Python Version PyPI - Downloads

Special thanks to Jim Fan for relinquishing the name "laminar" to me on pypi.org. Much appreciated, Jim!

laminar

Laminar seeks to take most of the hassle out of parallel processing in Python by providing user-friendly parallelization functionality.

Module Functions
results = laminar.iter_flow(my_function, my_iterable)
results = laminar.list_flow(my_function, my_list_of_data)

Class Usage
my_lam = laminar.Laminar()
my_lam.add_process("process_1", function_1, my_data)
my_lam.add_process("process_2", function_2, my_other_data)
my_lam.launch_processes()
results = my_lam.get_results()

Usage

Installation

Laminar is delivered as a package. To install, activate your preferred environment, then use:

pip install laminar

Laminar's laminar module only requires one third-party library, which is numpy. laminar_examples, a module with some practice functions and data objects, also requires pandas. Both libraries will be automatically included in the laminar installation.

Importing

You can use laminar by placing from laminar import laminar or import laminar.laminar as <some_alias> at the top of your python file. In order to practice/test laminar with built-in functions and data, place from laminar import laminar_examples or import laminar.laminar_examples as <some_alias> at the top of your python file.

If only using the class Laminar you can import it directly with from laminar.laminar import Laminar.

Using laminar

laminar currently consists of a class Laminar as well as two module functions that are designed to work with different data configurations, laminar.iter_flow and laminar.list_flow.

The Laminar class provides an instance that manages distinct processes and stores results. Class methods are available that allow the user to view, drop, clear, and launch processes.

To use the Laminar class, create a Laminar instance:

my_lam = Laminar()

Laminar class declarations have one optional argument for number of cores, which defaults to the number of cores on the current machine. Thus, if the user only wants to utilize two cores, the declaration would be:

my_lam = Laminar(2)

To add a process to the object's process batch, simply use the add_process() class method, which is very similar to the module function calls listed below, except add_process() also requires the user to pass a string as the name of the process. This name can be any string.

my_lam.add_process('process_1', function_1, my_data)

If more processes are added than the number of cores available, the process batch acts like a first in/first out queue. The most recent process will be added and the first process added to the batch will be removed.

Both of the module functions accept *args and **kwargs, which should be passed after data, so if function takes arg1 and arg2, like:

function(arg1, arg2)

you should call laminar like so:

laminar.iter_flow(function, data, arg1, arg2)
or
laminar.iter_flow(function, data, arg1=arg1, arg2=arg2)
or in the case of *args with **kwargs
laminar.iter_flow(function, data, arg1, arg2, kwarg=other_arg)

laminar.iter_flow is designed to work with a single iterable, such as a pandas DataFrame, a python list, etc. When you pass an iterable to laminar.iter_flow, it will automatically break your data up into chunks based on how many cores your machine has. It then queues up each chunk to be given to a core, which performs the work, then passes the data back as a descriptive dictionary of results. For example, a list of 1,000,000 integers is broken into chunks of length 250,000 on a machine with four cores. Each chunk is summed (as an example) by a core, and the results from each core are returned in a dict of size N = # cores. You are then able to combine the results in whatever way fits the computation that you need. For example, if the function passed to laminar.iter_flow computes the sum, then the values in the results dict should be summed to produce a total for the entire iterable.

Laminar Class Definition

Attribute	Description
`cores`	Number of cores available in an instance. This can be set manually in the instance declaration; it defaults to `cpu_count()`, which is number of cores available on your machine.
`results`	Dictionary that holds the results from the `launch_processes` method. Initializes to an empty dict.
`_processes`	`collections.OrderedDict()` that holds processes added by `add_process()`.
`_queue`	`multiprocessing.Queue()` that manages parallel processes.

Method	Argument(s)	Returns	Description
`add_process()`	`name: str`, `function: Callable`, `dataset: Collection`, `args`, `*kwargs`)	`None`	Add a named process to an instance's process pool. Process must include a name, function, and some data (in reality, this can be anything).
`show_processes()`	`None`	`None`	Displays processes currently in instance process pool.
`drop_process()`	`name: str`	`None`	Removes process with name of `name` from instance process pool.
`clear_processes()`	`None`	`None`	Removes all processes from instance process pool.
`launch_processes()`	`None`	`str: "Processes finished."`	Run all instance processes in parallel.
`get_results()`	`None`	`self.results: dict`	Returns the instance results dictionary.
`clear_results()`	`None`	`None`	Removes all results from instance results dictionary.

Module Function Examples

To illustrate how one would use laminar in their workflow, we'll use some premade functions and data structures located in laminar_examples. To shorten the following code examples up, we'll import laminar_examples as an alias le and use this alias throughout the rest of this readme.

from laminar import laminar_examples as le

laminar_examples.single_total

le.single_total is a simple function that accepts a single iterable and returns the sum total of the values in that iterable. le.single_total([1, 2, 1]) returns 4.

laminar_examples.multi_tally

le.multi_tally is a simple funtion that accepts a Pandas DataFrame and returns the number of rows that sum to greater than 25. le.multi_tally(pd.DataFrame({'Col1': [12, 12], 'Col2': [12, 14]}) returns 1 because the row at index 1 sums to 12 + 14 = 26, which meets the function's criteria, but the row at index 0 sums to 12 + 12 = 24, which does not.

laminar_examples.laminar_df

le.laminar_df is a Pandas DataFrame that constitutes 3 columns ['Col1', 'Col2', 'Col3'], each of which contains different integer values.

Col1	Col2	Col3
1	6	11
2	7	12
3	8	13
4	9	14
5	10	15
2	12	22
4	6	16
...	...	...

Example 1: Single iterable, single_total()

laminar.iter_flow(le.single_total, le.laminar_df['Col1']) returns

{
'data[0-5]': 17,
'data[12-17]': 60,
'data[18-23]': 86,
'data[24-29]': 115,
'data[30-34]': 105,
'data[35-39]': 120,
'data[40-44]': 135,
'data[6-11]': 37,
}

which is a dictionary describing the results for each section of your data. Each key/value pair in the returned dict corresponds to a segment of the iterable that was broken out and given to a process, with the key containing which portion of the data the result matches to. To complete your analysis, you can use whichever function coincides with the intended behavior of your analysis. In this case, since we are summing values, we can use sum().

The end result can look like one of these examples, although it doesn't have to: result = sum(laminar.iter_flow(le.single_total, le.laminar_df['Col1']).values())

result = laminar.iter_flow(le.single_total, le.laminar_df['Col1'])

result = sum(result.values())

where

result = 675

Example 2: Pandas DataFrame, multi_tally()

laminar.iter_flow(le.multi_tally, le.laminar_df) returns

{
'data[0-5]': 3,
'data[12-17]': 6,
'data[18-23]': 6,
'data[24-29]': 6,
'data[30-34]': 5,
'data[35-39]': 5,
'data[40-44]': 5,
'data[6-11]': 6,
}

which is a dict of counts. Each count is the return value for a segment of the data that was broken out and given to a process. To complete your analysis, you can use whichever function coincides with the intended behavior of your analysis. In this case, since we are counting values, it makes sense to use sum().

The end result can look like one of these examples, although it doesn't have to:
result = sum(laminar.iter_flow(le.multi_tally, le.laminar_df).values())

result = laminar.iter_flow(le.multi_tally, le.laminar_df)

result = sum(result.values())

where

result = 42

Example 3: List of single iterables, single_total()

laminar.list_flow(le.single_total, [le.laminar_df[col] for col in le.laminar_df.columns]) returns
{
'data_position_0': 675,
'data_position_1': 1800,
'data_position_2': 2925,
}
which is a list of the totals for each column in le.laminar_df. With this usage, a user can pass a list of iterables to list_flow; each iterable will be passed to its own process. This is useful for when a user intends to use the same function on multiple iterables, which can be columns in the same DataFrame, or independent lists. laminar.list_flow(laminar_examples.single_total, [laminar_examples.laminar_df[col] for col in laminar_examples.laminar_df.columns]) returns [675, 1800, 2925], which is a list of the totals for each column. With this usage, a user can pass a list of iterables to list_flow; each iterable will be passed to its own process. This is useful for when a user intends to use the same function on multiple iterables, which can be columns in the same DataFrame, or independent lists.

columns_list = [le.laminar_df[col] for col in le.laminar_df.columns]

result = laminar.list_flow(le.single_total, columns_list)

where

result = {'data_position_0': 675, 'data_position_1': 1800, 'data_position_2': 2925}

Example 4: List of Pandas DataFrames, multi_tally()

laminar.list_flow(le.multi_tally, [le.laminar_df]*3) returns
{
'data_position_0': 42,
'data_position_1': 42,
'data_position_2': 42,
}.
The result values are the same because we passed a list of 3 identical DataFrames; feel free to test this with different DataFrames of your own making.

data_frames_list = [le.laminar_df]*3

result = laminar.list_flow(le.multi_tally, data_frames_list)

where

result = {'data_position_0': 42, 'data_position_1': 42, 'data_position_2': 42}

Benchmarks

To date, laminar has been tested against traditional iterative analysis on the following functions:

String search function: count_snps()

Parameters

Files:

sample-1_S1_R1_001.fastq.gz
sample-1_S1_R2_001.fastq.gz

Total size of files:

26M

Length of Pandas DataFrame (going forward referred to as pd.DataFrame) object representation of combined files:

224706 rows

Results:

Traditional count_snps(pd.DataFrame): 42.6 seconds

Parallelized laminar.iter_flow(count_snps, pd.DataFrame): 17.49 seconds

Percent speedup: 58.96% faster

Final Notes

Which laminar tool a user will use depends on the structure of their data and the function that will be applied to that data. laminar.list_flow is not confined to operating on Pandas DataFrames; any list of iterable data objects can be passed to list_flow.

A basic rule of thumb is to use laminar.iter_flow for a single data object that one wishes to break into pieces in order to process it faster. laminar.list_flow is to be used in a situation where the user has multiple data objects that he or she wishes to be analyzed by the same function in parallel.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.6

Sep 15, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

laminar-py-1.1.6.tar.gz (13.8 kB view details)

Uploaded Sep 15, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

laminar_py-1.1.6-py3-none-any.whl (9.4 kB view details)

Uploaded Sep 15, 2022 Python 3

File details

Details for the file laminar-py-1.1.6.tar.gz.

File metadata

Download URL: laminar-py-1.1.6.tar.gz
Upload date: Sep 15, 2022
Size: 13.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for laminar-py-1.1.6.tar.gz
Algorithm	Hash digest
SHA256	`60e3238e0ca356638ba53fe83914668b4c784e0a5e35fbeb764eecfd230656b7`
MD5	`24f501b7c019123304a3cbf3b76c98b6`
BLAKE2b-256	`acac78cda1d427e5a45099b7f09d5c6f2ee457ade7371cc78ab7af6250faf4e3`

See more details on using hashes here.

File details

Details for the file laminar_py-1.1.6-py3-none-any.whl.

File metadata

Download URL: laminar_py-1.1.6-py3-none-any.whl
Upload date: Sep 15, 2022
Size: 9.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for laminar_py-1.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`037e9d6004051e68feea31c8fd193ed7eab721a7ccac907c283bc3c3e77ac23c`
MD5	`d4622020923375d58ad94dfeafe6d9cd`
BLAKE2b-256	`e0a05646870826d45c0352e17442e99fab780a30bc00123dcb5f63849f045b7e`

See more details on using hashes here.

laminar-py 1.1.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

laminar

Usage

Installation

Importing

Using laminar

Laminar Class Definition

Module Function Examples

laminar_examples.single_total

laminar_examples.multi_tally

laminar_examples.laminar_df

Example 1: Single iterable, single_total()

Example 2: Pandas DataFrame, multi_tally()

Example 3: List of single iterables, single_total()

Example 4: List of Pandas DataFrames, multi_tally()

Benchmarks

Parameters

Final Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes