Skip to main content

Streaming operations on NumPy arrays

Project description

Windows Build Status Documentation Build Status PyPI Version Conda-forge Version Supported Python Versions Code formatting style

npstreams is an open-source Python package for streaming NumPy array operations. The goal is to provide tested routines that operate on streams (or generators) of arrays instead of dense arrays.

Streaming reduction operations (sums, averages, etc.) can be implemented in constant memory, which in turns allows for easy parallelization.

This approach has been a huge boon when working with lots of images; the images are read one-by-one from disk and combined/processed in a streaming fashion.

This package is developed in conjunction with other software projects in the Siwick research group.

Motivating Example

Consider the following snippet to combine 50 images from an iterable source:

import numpy as np

images = np.empty( shape = (2048, 2048, 50) )
from index, im in enumerate(source):
    images[:,:,index] = im

avg = np.average(images, axis = 2)

If the source iterable provided 1000 images, the above routine would not work on most machines. Moreover, what if we want to transform the images one by one before averaging them? What about looking at the average while it is being computed? Let’s look at an example:

import numpy as np
from npstreams import iaverage
from scipy.misc import imread

stream = map(imread, list_of_filenames)
averaged = iaverage(stream)

At this point, the generators map and iaverage are ‘wired’ but will not compute anything until it is requested. We can look at the average evolve:

import matplotlib.pyplot as plt
for avg in average:
    plt.imshow(avg); plt.show()

We can also use last to get at the final average:

from npstreams import last

total = last(averaged) # average of the entire stream

Streaming Functions

npstreams comes with some streaming functions built-in. Some examples:

  • Numerics : isum, iprod, isub, etc.

  • Statistics : iaverage (weighted mean), ivar (single-pass variance), etc.

More importantly, npstreams gives you all the tools required to build your own streaming function. All routines are documented in the API Reference on readthedocs.io.

Benchmarking

npstreams provides a function for benchmarking common use cases.

To run the benchmark with default parameters, from the interpreter:

from npstreams import benchmark
benchmark()

From a command-line terminal:

python -c 'import npstreams; npstreams.benchmark()'

The results will be printed to the screen.

Future Work

Some of the features I want to implement in this package in the near future:

  • Optimize the CUDA-enabled routines

  • More functions : more streaming functions borrowed from NumPy and SciPy.

API Reference

The API Reference on readthedocs.io provides API-level documentation, as well as tutorials.

Installation

The only requirement is NumPy. To have access to CUDA-enabled routines, PyCUDA must also be installed. npstreams is available on PyPI; it can be installed with pip.:

python -m pip install npstreams

npstreams can also be installed with the conda package manager, from the conda-forge channel:

conda config --add channels conda-forge
conda install npstreams

To install the latest development version from Github:

python -m pip install git+git://github.com/LaurentRDC/npstreams.git

Each version is tested against Python 3.6+. If you are using a different version, tests can be run using the standard library’s unittest module.

Citations

If you find this software useful, please consider citing the following publication:

Support / Report Issues

All support requests and issue reports should be filed on Github as an issue.

License

npstreams is made available under the BSD License, same as NumPy. For more details, see LICENSE.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

npstreams-1.6.tar.gz (61.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

npstreams-1.6-py3-none-any.whl (39.2 kB view details)

Uploaded Python 3

File details

Details for the file npstreams-1.6.tar.gz.

File metadata

  • Download URL: npstreams-1.6.tar.gz
  • Upload date:
  • Size: 61.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.0.1.post20180504 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for npstreams-1.6.tar.gz
Algorithm Hash digest
SHA256 9a2fab6bb7f454db9c695385cab922bf0b60f032e8d9771373300ad11a090861
MD5 74a04b47021bb25108d65c2a442f9544
BLAKE2b-256 de9816d776cf23866f827e0596316678ba4d4ab9d78dddb5711ca92c345e1ec9

See more details on using hashes here.

File details

Details for the file npstreams-1.6-py3-none-any.whl.

File metadata

  • Download URL: npstreams-1.6-py3-none-any.whl
  • Upload date:
  • Size: 39.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.0.1.post20180504 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.5

File hashes

Hashes for npstreams-1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 35fa67df27f3a2f7dde7bb3fdbeea392792418b77c571f6eb4e10e3019afffac
MD5 0338bd522a46983b97af116497099cec
BLAKE2b-256 3a6e5ddcd0d391b428a2820060eecbce54ad66f8372e2283ce89e781838ef52d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page