Skip to main content

Efficient matrix representations for working with tabular data.

Project description

Efficient matrix representations for working with tabular data

CI Docs Conda-forge PypiVersion PythonVersion

Installation

Simply install via conda-forge!

conda install -c conda-forge tabmat

Getting Started

The easiest way to start with tabmat is to use the convenience constructor tabmat.from_pandas.

import tabmat as tm
import numpy as np

dense_array = np.random.normal(size=(100, 1))

Use case

TL;DR: We provide matrix classes for efficiently building statistical algorithms with data that is partially dense, partially sparse and partially categorical.

Data used in economics, actuarial science, and many other fields is often tabular, containing rows and columns. Further common properties are also common:

  • It often is very sparse.
  • It often contains a mix of dense and sparse columns.
  • It often contains categorical data, processed into many columns of indicator values created by "one-hot encoding."

High-performance statistical applications often require fast computation of certain operations, such as

  • Computing sandwich products of the data, transpose(X) @ diag(d) @ X. A sandwich product shows up in the solution to weighted least squares, as well as in the Hessian of the likelihood in generalized linear models such as Poisson regression.
  • Matrix-vector products, possibly on only a subset of the rows or columns. For example, when limiting computation to an "active set" in a L1-penalized coordinate descent implementation, we may only need to compute a matrix-vector product on a small subset of the columns.
  • Computing all operations on standardized predictors which have mean zero and standard deviation one. This helps with numerical stability and optimizer efficiency in a wide range of machine learning algorithms.

This library and its design

We designed this library with the above use cases in mind. We built this library first for estimating generalized linear models, but expect it will be useful in a variety of econometric and statistical use cases. This library was borne out of our need for speed, and its unified API is motivated by the desire to work with a unified matrix API internal to our statistical algorithms.

Design principles:

  • Speed and memory efficiency are paramount.
  • You don't need to sacrifice functionality by using this library: DenseMatrix and SparseMatrix subclass np.ndarray and scipy.sparse.csc_matrix respectively, and inherit behavior from those classes wherever it is not improved on.
  • As much as possible, syntax follows NumPy syntax, and dimension-reducing operations (like sum) return NumPy arrays, following NumPy dimensions about the dimensions of results. The aim is to make these classes as close as possible to being drop-in replacements for numpy.ndarray. This is not always possible, however, due to the differing APIs of numpy.ndarray and scipy.sparse.
  • Other operations, such as toarray, mimic Scipy sparse syntax.
  • All matrix classes support matrix-vector products, sandwich products, and getcol.

Individual subclasses may support significantly more operations.

Matrix types

  • DenseMatrix represents dense matrices, subclassing numpy nparray. It additionally supports methods getcol, toarray, sandwich, standardize, and unstandardize.
  • SparseMatrix represents column-major sparse data, subclassing scipy.sparse.csc_matrix. It additionally supports methods sandwich and standardize.
  • CategoricalMatrix represents one-hot encoded categorical matrices. Because all the non-zeros in these matrices are ones and because each row has only one non-zero, the data can be represented and multiplied much more efficiently than a generic sparse matrix.
  • SplitMatrix represents matrices with both dense, sparse and categorical parts, allowing for a significant speedup in matrix multiplications.
  • StandardizedMatrix efficiently and sparsely represents a matrix that has had its column normalized to have mean zero and variance one. Even if the underlying matrix is sparse, such a normalized matrix will be dense. However, by storing the scaling and shifting factors separately, StandardizedMatrix retains the original matrix sparsity.

Wide data set

Benchmarks

See here for detailed benchmarking.

API documentation

See here for detailed API documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabmat-3.1.12.tar.gz (2.1 MB view details)

Uploaded Source

Built Distributions

tabmat-3.1.12-cp312-cp312-win_amd64.whl (649.2 kB view details)

Uploaded CPython 3.12 Windows x86-64

tabmat-3.1.12-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.6 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

tabmat-3.1.12-cp312-cp312-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

tabmat-3.1.12-cp312-cp312-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

tabmat-3.1.12-cp311-cp311-win_amd64.whl (649.1 kB view details)

Uploaded CPython 3.11 Windows x86-64

tabmat-3.1.12-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

tabmat-3.1.12-cp311-cp311-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

tabmat-3.1.12-cp311-cp311-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

tabmat-3.1.12-cp310-cp310-win_amd64.whl (647.9 kB view details)

Uploaded CPython 3.10 Windows x86-64

tabmat-3.1.12-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

tabmat-3.1.12-cp310-cp310-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

tabmat-3.1.12-cp310-cp310-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

tabmat-3.1.12-cp39-cp39-win_amd64.whl (649.0 kB view details)

Uploaded CPython 3.9 Windows x86-64

tabmat-3.1.12-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

tabmat-3.1.12-cp39-cp39-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

tabmat-3.1.12-cp39-cp39-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file tabmat-3.1.12.tar.gz.

File metadata

  • Download URL: tabmat-3.1.12.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for tabmat-3.1.12.tar.gz
Algorithm Hash digest
SHA256 d48d16bb1f01bdec43312a361a0b0f5cb4cd3f9ed82beded23e9f3a48358572c
MD5 4e293b780590e524007f15ea3490a507
BLAKE2b-256 d65749c4cb758ef1f4ea22b2a3d27dabcdb3bc3192df4b3a2f3da195497e4776

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: tabmat-3.1.12-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 649.2 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for tabmat-3.1.12-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 5835d50474d45b8611e02475a795fccd82c7849bae9ec89e6a4cac3ae74332da
MD5 7697ef81e6e30ea4ba2bab7b6bd70984
BLAKE2b-256 1acf8b4936885c4cc7381c9869eb3fed882ae6663ce9686ad0593c0b9bd3758f

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.12-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 764a6eb007e666864fb180f62b1bd37e8f58ee09c915e86840ffde7ad14eeddf
MD5 13f726409603d831ba53be2f206aa872
BLAKE2b-256 7e908cb8471389300ce680e16006e9c3441ff12cee3f32892efea99bc15f6cf0

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.12-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 199c69029071d8c6459c89d5027b11a751143a3f15f6a8bbf71cf18e6d271f87
MD5 7e67ca4016b93ff9bc33f28ef87d19b4
BLAKE2b-256 edb9c2843849525473cf362bad0bdaa30500976835a332142738eecf9e8600e0

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.12-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 6b7dbb96d8bf4a622db84e7a1466d09a79ef746ad1f4a118a4d32df80b871ab0
MD5 a4db324e54ba89ec150324370d845325
BLAKE2b-256 641483bba0c63e5f7c721d1c395c3f32801471c3c17819240285af1051ebee49

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: tabmat-3.1.12-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 649.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for tabmat-3.1.12-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5f9cec00fffb3a9487050d5ceba8c6d70cf173be9f2beb2ee2f72c77cb694034
MD5 88a81bdef75b6fa93840e6505f881e28
BLAKE2b-256 9fc51da84839f05cc3cb5914c46ecc078e8eb2f0512ca0d423c4ce0a35f34f6d

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.12-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ac6baa2e91d430b36f9503fc5461e7e6609e1a47d535faa2d999842a0787500d
MD5 ccce19e9da1f332d0669f994f77c356c
BLAKE2b-256 82bb9fc7e754bb79d020bab4040689d758d8b1000e54d3f5702472ce4ceca069

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.12-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d61a6ca05a73998322aedc96b760a56b436abda68f2c338b8ed428e353c3c9fb
MD5 a432a5f6680b1f147a892c7e4f58b2a5
BLAKE2b-256 563c8ea6cd875b53c8d0099019b01c3ffb85b56bb1d8f19839ea4af6dea68099

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.12-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 76ef2f717b430c9fdb12007c5769bbf792afa94a3696c6c61249e74d98dd529c
MD5 101b92197fa3bfcd8ff660c51b800165
BLAKE2b-256 eb3c92c01c79161ae2e234df026fe51e276a717445c9af73df75264ea56de715

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: tabmat-3.1.12-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 647.9 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for tabmat-3.1.12-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 886a99a713bb6f6ba5fd6a603e29a5f7542c9ad15544bfed326c1d36ef69c5b0
MD5 70dbb6528102be085b10102be3218f2c
BLAKE2b-256 21b2b887c082160b5b537aedd6955a2e837e093ace66d9570f60a725cd6595c3

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.12-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 50599f87167bd0cc7458c30d1d2cd0750ce4b7fb00f09ae5b6384d62f633b256
MD5 f493aeebe3ba1c0bbbe78d568da80776
BLAKE2b-256 68e1f1f8a072f26a2372ecce85f61f7d5d43bce5997253b54d7faec468495fef

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.12-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b128dc5b3db051893446492b1b61707787ead9ad609b8da933f32a751cb737d7
MD5 380add05e486914c3e8dbc060085380d
BLAKE2b-256 0898711b5bd58610fd4139517823a0339280d150a528b7cedc0450b3cd0ffa6c

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.12-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 017e139055be247a2b316df4a6b2a9ca603f7e5ffac0548777c4b904f4b5c572
MD5 78da639b1019fa8e3055ba749c10732a
BLAKE2b-256 61c53e475ba74d11ff5035f2abf8f6d53474b688324a6636fa39b003a9d42f02

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: tabmat-3.1.12-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 649.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for tabmat-3.1.12-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 625ed89059aba8e7580609162735013f9af53c9b8dfce75ead345fc74d0ecf68
MD5 20f48e57021916b06b24109e54270745
BLAKE2b-256 0cf91dc73c68b48a78dae8bc30ddfd3f6a78c12ac16acbe77b58f5c77210d29c

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.12-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 db4e6325e93057cd9921e126c8b365600a142a4f06a0c28cbbe922991c462020
MD5 2596f6bdd06cab56791afbd46549125c
BLAKE2b-256 7f357c847258f2bd833d5735e1c33b70a5f2d2cda3450a7cbb0311f52f1f3c9c

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.12-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 66b412126c4025d77137fb2d9e73ccb758c4b404194df7b5ec3180746df15b37
MD5 d98973d520bce4aae57a810ae33ed15b
BLAKE2b-256 eac67884d915465eabc3a1042f4468b813818ce1c64c83e2f5c4cc0eab347105

See more details on using hashes here.

File details

Details for the file tabmat-3.1.12-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.12-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c1c6bb02a7d24b0f0eaf83b43fe488a2ad1e6b49eef5d2fd5f1080af6aaf127d
MD5 1808ec3a57df6b3dda7e7963d47e2f39
BLAKE2b-256 8ffba11129a1a898f969f7f88b7aad8b75f9b80682a513410d8e21842684cda7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page