Skip to main content

Efficient matrix representations for working with tabular data.

Project description

Efficient matrix representations for working with tabular data

CI Docs Conda-forge PypiVersion PythonVersion

Installation

Simply install via conda-forge!

conda install -c conda-forge tabmat

Getting Started

The easiest way to start with tabmat is to use the convenience constructor tabmat.from_pandas.

import tabmat as tm
import numpy as np

dense_array = np.random.normal(size=(100, 1))

Use case

TL;DR: We provide matrix classes for efficiently building statistical algorithms with data that is partially dense, partially sparse and partially categorical.

Data used in economics, actuarial science, and many other fields is often tabular, containing rows and columns. Further common properties are also common:

  • It often is very sparse.
  • It often contains a mix of dense and sparse columns.
  • It often contains categorical data, processed into many columns of indicator values created by "one-hot encoding."

High-performance statistical applications often require fast computation of certain operations, such as

  • Computing sandwich products of the data, transpose(X) @ diag(d) @ X. A sandwich product shows up in the solution to weighted least squares, as well as in the Hessian of the likelihood in generalized linear models such as Poisson regression.
  • Matrix-vector products, possibly on only a subset of the rows or columns. For example, when limiting computation to an "active set" in a L1-penalized coordinate descent implementation, we may only need to compute a matrix-vector product on a small subset of the columns.
  • Computing all operations on standardized predictors which have mean zero and standard deviation one. This helps with numerical stability and optimizer efficiency in a wide range of machine learning algorithms.

This library and its design

We designed this library with the above use cases in mind. We built this library first for estimating generalized linear models, but expect it will be useful in a variety of econometric and statistical use cases. This library was borne out of our need for speed, and its unified API is motivated by the desire to work with a unified matrix API internal to our statistical algorithms.

Design principles:

  • Speed and memory efficiency are paramount.
  • You don't need to sacrifice functionality by using this library: DenseMatrix and SparseMatrix subclass np.ndarray and scipy.sparse.csc_matrix respectively, and inherit behavior from those classes wherever it is not improved on.
  • As much as possible, syntax follows NumPy syntax, and dimension-reducing operations (like sum) return NumPy arrays, following NumPy dimensions about the dimensions of results. The aim is to make these classes as close as possible to being drop-in replacements for numpy.ndarray. This is not always possible, however, due to the differing APIs of numpy.ndarray and scipy.sparse.
  • Other operations, such as toarray, mimic Scipy sparse syntax.
  • All matrix classes support matrix-vector products, sandwich products, and getcol.

Individual subclasses may support significantly more operations.

Matrix types

  • DenseMatrix represents dense matrices, subclassing numpy nparray. It additionally supports methods getcol, toarray, sandwich, standardize, and unstandardize.
  • SparseMatrix represents column-major sparse data, subclassing scipy.sparse.csc_matrix. It additionally supports methods sandwich and standardize.
  • CategoricalMatrix represents one-hot encoded categorical matrices. Because all the non-zeros in these matrices are ones and because each row has only one non-zero, the data can be represented and multiplied much more efficiently than a generic sparse matrix.
  • SplitMatrix represents matrices with both dense, sparse and categorical parts, allowing for a significant speedup in matrix multiplications.
  • StandardizedMatrix efficiently and sparsely represents a matrix that has had its column normalized to have mean zero and variance one. Even if the underlying matrix is sparse, such a normalized matrix will be dense. However, by storing the scaling and shifting factors separately, StandardizedMatrix retains the original matrix sparsity.

Wide data set

Benchmarks

See here for detailed benchmarking.

API documentation

See here for detailed API documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabmat-4.0.0.tar.gz (2.2 MB view details)

Uploaded Source

Built Distributions

tabmat-4.0.0-cp312-cp312-win_amd64.whl (663.1 kB view details)

Uploaded CPython 3.12 Windows x86-64

tabmat-4.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

tabmat-4.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.2 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

tabmat-4.0.0-cp312-cp312-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

tabmat-4.0.0-cp312-cp312-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

tabmat-4.0.0-cp311-cp311-win_amd64.whl (665.6 kB view details)

Uploaded CPython 3.11 Windows x86-64

tabmat-4.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

tabmat-4.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

tabmat-4.0.0-cp311-cp311-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

tabmat-4.0.0-cp311-cp311-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

tabmat-4.0.0-cp310-cp310-win_amd64.whl (664.9 kB view details)

Uploaded CPython 3.10 Windows x86-64

tabmat-4.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

tabmat-4.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

tabmat-4.0.0-cp310-cp310-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

tabmat-4.0.0-cp310-cp310-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

tabmat-4.0.0-cp39-cp39-win_amd64.whl (666.0 kB view details)

Uploaded CPython 3.9 Windows x86-64

tabmat-4.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

tabmat-4.0.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

tabmat-4.0.0-cp39-cp39-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

tabmat-4.0.0-cp39-cp39-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file tabmat-4.0.0.tar.gz.

File metadata

  • Download URL: tabmat-4.0.0.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tabmat-4.0.0.tar.gz
Algorithm Hash digest
SHA256 60a9f93ed16a540458957b5ef56c6b99424e999145aad31fbae8ebbfd985f444
MD5 9fc2cdcb7625af79504e2067692345e2
BLAKE2b-256 87e8c45b6050167a671f828b888e162638b97ff9d2ea7d8a6796d72fbf8a3253

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: tabmat-4.0.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 663.1 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tabmat-4.0.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 08f4c58504cd7fdb443fabef0871054f6b46f4541e2b5f447a2298169ef46a09
MD5 43b25aeff3708572b10fc7d71ee498af
BLAKE2b-256 79c78808c0249ae0b3f5eff2047200efe2ef0cab733bdfad105ef57a3c4c554e

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a73df3d6959f6a23ce74bb528133b8d355460ce80b0b863387eddc65c01c13c1
MD5 f3e5ac5e65e233742b67af52dd0446a4
BLAKE2b-256 65d9b285b49bd4e1b3d5c50c01d80df3bf3194432f1c3fe0d98fa63f2a283e60

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a8de44c99afefaf2791169fc6724506c2421df83779c383caa03cf3e9079ab53
MD5 7ddc0db25cd81bf9936982e0b72d18cc
BLAKE2b-256 15fbd60ed862d37dc9a052ebed96117474e56d31d28e7d479f28afa62e36bf8c

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 67d39bb1465c08989c0f84b9262bf697ee277d7e463148190bdb6e3d521ed441
MD5 823d6b49aa8ac370f3b5a55007e129c6
BLAKE2b-256 50e7e0f9be9a716ce94219d449becb7059b01faf6ec6aa723560884bbfd40f02

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 82c7cf0ee3ffba97bfef78f02aa04cb8a1c91c7409e62d9162bc7fa8ee1b9ed5
MD5 9d0e10c0e6dec4b37844889e43d7812f
BLAKE2b-256 0d41ba8ad9aa952517250720832a19df7330d823c22e86eddd53d400f30bf6db

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: tabmat-4.0.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 665.6 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tabmat-4.0.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 9046b9a88a6657e9ebae70319390ad2cf60f8f6b0a39c926633dbfa024ca8422
MD5 6ba0526d19f35f5cd9ee2b6d29c8c523
BLAKE2b-256 82c8290a9a49aeea818fbafa9d5aa943bcdbcb2840af57d124f2261c150bfeb9

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 070036e57eeee3900214550dd41dcdaeb2ef340c5288dcc79b6ae85e92d78846
MD5 946a38f37827387ca2efd4115a7eaff1
BLAKE2b-256 2cf4b09a85732ea1d90ae27b01f7fa17b934cd23eee8d07dc2970f883cbc3a85

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a782155f39b0a81bf99e31591a43a6891a39d8cd2316c60d5ef5db840d6dee3e
MD5 b81821d64f3ba6633131bda6fddabb42
BLAKE2b-256 018af930e55a4ff9e75d42afb67ca59b9ef09e2d812d41f014f116fbe003cc4a

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 cecdcd88c52625aa7d0ef39d65c6fa2cbca4861860b20c3f18787ca5d0118f45
MD5 a584686a271f09bf5cf6fc143c5624b7
BLAKE2b-256 9e23c4b098252dbaa456b09de782e2c55de0b4420be31e55a3cfbb4d38c95751

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7d1b5603a897148b313184ed33ffd63d8c1bda245222a34b4b46a3d293e494c1
MD5 48e0fa9c7cbe349ec723f622ce611839
BLAKE2b-256 94ddfec14b0e62cd02879f27caa36df5949f8c930efb5d5393ba98fde496e739

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: tabmat-4.0.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 664.9 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tabmat-4.0.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 4fa55483361cff21fa83721708917cf12a9f4194043024c27d28e02910c51c98
MD5 e0bca016b34d576305cb29c0ec217359
BLAKE2b-256 17e54788d0d4cf0d1e9aed0c7e28de7ae916f817c2e39acdd7a49d3b4bf06f0a

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 65c34ae8b6ed4fa16ed639d0a04d7586f981456c1b4249f6c825c2f04fac8267
MD5 294929cd5ba569b53e3486b0619ded31
BLAKE2b-256 69e60a8534bdaf0b11ca27b3e02fe7cbe468c8b5d7b7b4dc43696734461d18a9

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 cee9c2ad7911472a0635bfce3e901b8cc56704ad12ff5109f587805bd9c3e631
MD5 099b577c55d33457fe2c8011727e02f7
BLAKE2b-256 5ec1231cb0b6c9651343c78b73d01ce5c4a901d901d14bbef514aed8c02c980c

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4fc486be7090f562535915b7cd9243018961d51387866afa06596a7df8225cf1
MD5 de4c618d80b7f15e2073b3b0154637f3
BLAKE2b-256 3d01514ae7f0c0ab8e6711bbd78b4538d663e5c04b3ef675f38e57d45e54c1bb

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 bbd7ec6ecd7391dd714160915178bf6120c1ecd880cbeb0c47deead6d5742d01
MD5 b0a2248972f3cca8812346108cd6f47b
BLAKE2b-256 0d582c1f0c1cfd97b097951420a843486f08c38a6c94ef370572759eefaff651

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: tabmat-4.0.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 666.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tabmat-4.0.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 6b7820eb71f8690d8440e6f74f8370e612ae63489bfd79fb99511808d90d8c38
MD5 0d8d6e3fa9c5f573a87cf10401f40b94
BLAKE2b-256 7bc4ee87d60d61a0cbebfcbaac8d21525f09111afb86e5cf61030d83147e24aa

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 16853534fb7b06a4309d5785c2b82c7f60c0640e90dbf13b1c08180958d00259
MD5 09c78193f669125e8b973d5b1a6a543f
BLAKE2b-256 b2563a4e6a6f086f9285076eb29d833bdc2c0df9faf8f4d88b8afd286c148eb5

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d3503639280baf30a00868f9b8f8eac0ed775a8637d2347cd82992df0c12ce99
MD5 3bdd6114089a8ba016defa27c70cd9fe
BLAKE2b-256 8da587169242a552945b52cc11f06da957ad2e5fa6a0704b0f8aceb420c4d095

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a1929fb064c0b1352277087bb27c644fbc46420f7125daae0c0bbd912e94b997
MD5 3a43f3af21736c551ced331897cb9d77
BLAKE2b-256 f70c86b7faabd913f0a3b4e329100cf06e81c33dabfd3d9cfc932dd8ae36a4ca

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 45fc292d674403b312cd6b4440c9d62963014cb4c5a56970e4d668deca76ccfe
MD5 097c85e21882e8eda835f700fd76d0d4
BLAKE2b-256 134b3153dd795686e4c0456172f20254e79f160e23120f52667fe23c54ed5a4c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page