Skip to main content

Sparse Tools for Analysis

Project description

spartans - SPARse Tools for ANalysiS

logo https://img.shields.io/pypi/v/spartans.svg Documentation Status

When working with sparse matrices, it is desired to have a way to work with them as if they were a regular numpy.arrays. Yet, many popular methods for arrays don’t exist for sparse matrices. spartans wishes to help, with many operations to work with

Full example notebook

Features

Mathematical Operations

Rich set of operations not supported on sparse matrices like variance, cov (covariance matrix) and corrcoef (correlation matrix).

Easy Indexing

Convenient methods to index for “extra” sparse features by variance or by quantity.

Masking

Many algorithms consider the zeros in a sparse matrix as missing data. Or considering missing data as zeros. Depending on the use-case. spartans

FeatureMatrix

FeatureMatrix is a spartan's first-class citizen. It is a wrapper around scipy.sparse.csr Matrix built with data analysis and data-science in mind.

Examples

Full example notebook

>>> import spartans as st
>>> from scipy.sparse import csr_matrix
>>> import numpy as np
>>> m = np.array([[1, -2, 0, 50],
                  [0, 0, 0, 100],
                  [1, 0, 0, 80],
                  [1, 4, 0, 0],f
                  [0, 0, 0, 0],
                  [0, 4, 0, 0],
                  [0, 0, 0, -50]])
>>> c = csr_matrix(m)

We can get the the correlation matrix of m using numpy.

>>> np.corrcoef(m, rowvar=False)
Out[]: array([[ 1.  , -0.08,   nan,  0.31],
              [-0.08,  1.  ,   nan, -0.35],
              [  nan,   nan,   nan,   nan],
              [ 0.31, -0.35,   nan,  1.  ]])

This won’t work with the sparse matrix c

>>> np.corrcoef(c, rowvar=False)
AttributeError: 'float' object has no attribute 'shape'

But with spartans this can be done.

>>> st.corr(c)
Out[]: array([[ 1.  , -0.08,   nan,  0.31],
              [-0.08,  1.  ,   nan, -0.35],
              [  nan,   nan,   nan,   nan],
              [ 0.31, -0.35,   nan,  1.  ]])

The column and row with nan is because the original matrix has a columns (feature) which is zero for the entire column. spartans can handle that using st.non_zero_index(c, axis=0, as_bool=False) which will return array([0, 1, 3]). A lot more functionality is in the notebook.

Credits

History

0.1.0 (2020-02-20)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spartans-0.2.0.tar.gz (25.9 kB view hashes)

Uploaded Source

Built Distribution

spartans-0.2.0-py2.py3-none-any.whl (18.7 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page