Skip to main content

Efficient matrix representations for working with tabular data.

Project description

Efficient matrix representations for working with tabular data

CI Docs Conda-forge PypiVersion PythonVersion

Installation

Simply install via conda-forge!

conda install -c conda-forge tabmat

Getting Started

The easiest way to start with tabmat is to use the convenience constructor tabmat.from_pandas.

import tabmat as tm
import numpy as np

dense_array = np.random.normal(size=(100, 1))

Use case

TL;DR: We provide matrix classes for efficiently building statistical algorithms with data that is partially dense, partially sparse and partially categorical.

Data used in economics, actuarial science, and many other fields is often tabular, containing rows and columns. Further common properties are also common:

  • It often is very sparse.
  • It often contains a mix of dense and sparse columns.
  • It often contains categorical data, processed into many columns of indicator values created by "one-hot encoding."

High-performance statistical applications often require fast computation of certain operations, such as

  • Computing sandwich products of the data, transpose(X) @ diag(d) @ X. A sandwich product shows up in the solution to weighted least squares, as well as in the Hessian of the likelihood in generalized linear models such as Poisson regression.
  • Matrix-vector products, possibly on only a subset of the rows or columns. For example, when limiting computation to an "active set" in a L1-penalized coordinate descent implementation, we may only need to compute a matrix-vector product on a small subset of the columns.
  • Computing all operations on standardized predictors which have mean zero and standard deviation one. This helps with numerical stability and optimizer efficiency in a wide range of machine learning algorithms.

This library and its design

We designed this library with the above use cases in mind. We built this library first for estimating generalized linear models, but expect it will be useful in a variety of econometric and statistical use cases. This library was borne out of our need for speed, and its unified API is motivated by the desire to work with a unified matrix API internal to our statistical algorithms.

Design principles:

  • Speed and memory efficiency are paramount.
  • You don't need to sacrifice functionality by using this library: DenseMatrix and SparseMatrix subclass np.ndarray and scipy.sparse.csc_matrix respectively, and inherit behavior from those classes wherever it is not improved on.
  • As much as possible, syntax follows NumPy syntax, and dimension-reducing operations (like sum) return NumPy arrays, following NumPy dimensions about the dimensions of results. The aim is to make these classes as close as possible to being drop-in replacements for numpy.ndarray. This is not always possible, however, due to the differing APIs of numpy.ndarray and scipy.sparse.
  • Other operations, such as toarray, mimic Scipy sparse syntax.
  • All matrix classes support matrix-vector products, sandwich products, and getcol.

Individual subclasses may support significantly more operations.

Matrix types

  • DenseMatrix represents dense matrices, subclassing numpy nparray. It additionally supports methods getcol, toarray, sandwich, standardize, and unstandardize.
  • SparseMatrix represents column-major sparse data, subclassing scipy.sparse.csc_matrix. It additionally supports methods sandwich and standardize.
  • CategoricalMatrix represents one-hot encoded categorical matrices. Because all the non-zeros in these matrices are ones and because each row has only one non-zero, the data can be represented and multiplied much more efficiently than a generic sparse matrix.
  • SplitMatrix represents matrices with both dense, sparse and categorical parts, allowing for a significant speedup in matrix multiplications.
  • StandardizedMatrix efficiently and sparsely represents a matrix that has had its column normalized to have mean zero and variance one. Even if the underlying matrix is sparse, such a normalized matrix will be dense. However, by storing the scaling and shifting factors separately, StandardizedMatrix retains the original matrix sparsity.

Wide data set

Benchmarks

See here for detailed benchmarking.

API documentation

See here for detailed API documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabmat-4.0.0a3.tar.gz (2.2 MB view details)

Uploaded Source

Built Distributions

tabmat-4.0.0a3-cp312-cp312-win_amd64.whl (663.0 kB view details)

Uploaded CPython 3.12 Windows x86-64

tabmat-4.0.0a3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

tabmat-4.0.0a3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.2 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

tabmat-4.0.0a3-cp312-cp312-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

tabmat-4.0.0a3-cp312-cp312-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

tabmat-4.0.0a3-cp311-cp311-win_amd64.whl (665.5 kB view details)

Uploaded CPython 3.11 Windows x86-64

tabmat-4.0.0a3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

tabmat-4.0.0a3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

tabmat-4.0.0a3-cp311-cp311-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

tabmat-4.0.0a3-cp311-cp311-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

tabmat-4.0.0a3-cp310-cp310-win_amd64.whl (664.9 kB view details)

Uploaded CPython 3.10 Windows x86-64

tabmat-4.0.0a3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

tabmat-4.0.0a3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

tabmat-4.0.0a3-cp310-cp310-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

tabmat-4.0.0a3-cp310-cp310-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

tabmat-4.0.0a3-cp39-cp39-win_amd64.whl (665.9 kB view details)

Uploaded CPython 3.9 Windows x86-64

tabmat-4.0.0a3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.2 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

tabmat-4.0.0a3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

tabmat-4.0.0a3-cp39-cp39-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

tabmat-4.0.0a3-cp39-cp39-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file tabmat-4.0.0a3.tar.gz.

File metadata

  • Download URL: tabmat-4.0.0a3.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for tabmat-4.0.0a3.tar.gz
Algorithm Hash digest
SHA256 c795687f66d38a8d2ebbb4b89905c49e2b233b1300ad031846f239c365dc7f2a
MD5 aab702aeded17f85e18b1d2f955051cd
BLAKE2b-256 73261a27898c4d8cb2fda448c1c56beda7f14406ddf4b1b1f568c60fefa98fa6

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: tabmat-4.0.0a3-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 663.0 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for tabmat-4.0.0a3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 c534aedc9c875ff266d5977b5a5bc7f64d71b71ee1fc4b1d229042009062a89b
MD5 78fc69db346a22602bb2432fc1ec68ef
BLAKE2b-256 163df1b54807805561ac392968f0c25855f9ede526ffeb09c42186058601f3e4

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6838638fa944b2b8618495110514492904a1da011322a478bcf918f11224c923
MD5 5a4d9b569c0b33ac2fd0abe7e303289f
BLAKE2b-256 92ad78cfb0975ee997b8a71a91ea9c8758093befcbc9a21f6d02645803d8e7a1

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 62afe65270832286e4f2cd79fc2ad806850d791ac1e1012c22e56d12ee645f49
MD5 c404225505ae58e71282d14c5979ffe7
BLAKE2b-256 cd7b010d8c27c0fc4de43b37a1b49050864f48963e6787e0cc5a16b2078b326f

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f4883ba27bee6fdfbcaae48ecbe39b9c1bf5b3ab382013b4944c45c79af0525f
MD5 bd2608a6b0304b73b26204f9b6b2de7f
BLAKE2b-256 2a01b14c5f516ee6a07d96899c56d64cf266402c8a89e89a2e1429a21619c320

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5758094e3a830ff063e4a4025c169f0a1ff6002472f387db7e5d4fb4df8f073d
MD5 7979c2255c0faf77c950926394f15880
BLAKE2b-256 b3f657e786e8094fd408d8484a2ec552f6d7ce3804f3d701063cc6db0aac3022

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: tabmat-4.0.0a3-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 665.5 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for tabmat-4.0.0a3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 7505d4f60920b138585378197fa68a87966161b5c6d3dcb17d08ff9fabad9603
MD5 b121734c8c000a1a21d5b2e1418542c6
BLAKE2b-256 b02606e4cf9d4f6cfba95e4bd130fd22798408067c83b8b9e3e8ade90ea04864

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 059a7fd8e2dd50801d7f8e52d58af3e0ac65506123013985ebc503b3fb9bf9f7
MD5 c68ddc6b6eca3e2644eb1cabe8a18271
BLAKE2b-256 d54cfbc41f1f696236f75132991c3254a72b8b41fe1103d0ed7698fa13d1211b

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e80c0e847d4acd8959b5833a4ec468e443263c5b99f0a196c68fc7aa3b445d35
MD5 bdaf45c1de177a3a9ac583bd0e023a1f
BLAKE2b-256 113af920af0b224471593141af382b334763e6cd3148b82cbd5feadd1ab085aa

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e98013c38bb8f8f4ea544cd7a7578e5229c3f432eb6f6bc9397e5e73cc35580d
MD5 61c9d47b9c2dc9a401e3fe9ec2df4ce4
BLAKE2b-256 0274ec81730aed5459e235fd089150d0f14e72d06715c25a5288f45457eb12c6

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 b32cc7230a99125851e158fa91fff4e6b154a1f91d3f8f870b3124273c76b245
MD5 1a77e38d996ba0a2463876660b5f9cbb
BLAKE2b-256 f99a62afcd4d675dd2ac8eb5d4355add6d47696217234f6c6e3350a4b5eadca4

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: tabmat-4.0.0a3-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 664.9 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for tabmat-4.0.0a3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 1558728433cab91b441b48f4d1c5ccebc0d6e061849a1a6dc9e7675511b126e2
MD5 9ccb4f69b4c71bd8f68b83407b2c3f52
BLAKE2b-256 7569400a19179432f5073ba2da4b2404f193c8c6ad55b1fee1cb67a467efbd8a

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9b9a05583fbccde480d83b3c78fd4bba238ec5441f56cc64e9ba1e77bac796ee
MD5 2a7a67705b9e50038e834d5907071ff7
BLAKE2b-256 8fbd770c40662ba6d0002fecb1aaf66eb86475d45393879a954ac63af85dcfc5

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c8567b5fc5946f5d4669892fb718fca25f90be0e3562264ba9c28aca822a25e6
MD5 c4e6851640e1f48fe8e47cb713639298
BLAKE2b-256 5b51f6b23bfa388c6c0ba96d7faa755ab14ad5bc33bebe8b3b173f5fdf6e9b39

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 11eb757ba98f4d4a337e1004c576e80374168476e83fe16e045fb9c3493ccc9f
MD5 5b74d2a2f67657d0caebc8151550a5f4
BLAKE2b-256 8bc15d6b5bcfa59424117806e185cf93e58b1fa14225c2bc6778baa15dee84f8

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 0133e73d522d89e206b94b9a889d7bbca0ace2f946ae07475a6f84f8e802a6c2
MD5 3251891841bac5343e56f412f6a3574d
BLAKE2b-256 173d6f6f0c470e1da90cbfd65b1184c41427ac8a70d32e5bc8f76215c65b34f9

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: tabmat-4.0.0a3-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 665.9 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for tabmat-4.0.0a3-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 913527abf7ad58454421a9cbd7bf97de0e6edfa8e4f74817fca8f9a3ad4d8bf4
MD5 4d0675002bf0de798739b86fd74f4bbd
BLAKE2b-256 f7a8a66d4be40440406b210c10c86ba87d6e0e14d2aad4ad17e40fc25dfef53d

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2d24cee2f21f471351148bd38544f6918139c933faebdbb9b5cf0b9a68d09980
MD5 b6239e97ad0c2b5124002fe37cf5eb53
BLAKE2b-256 24f5e89786136da4505e98ddfff843d75118daaf710a821a7bf2d8c4242b28e6

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1837ae5aea65095b17e8a329ac4f23b5d3ed2ba3037ae6e480ac5b7f4764fd09
MD5 07107b4e043a0169bbd7c845f7681c5a
BLAKE2b-256 b8f10d633dd61fdc40cc6adcb4a2c164ad74f2a23f3588423207f042d4b33741

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e195ecf546acf4821cff6bc91de2be5403e6ec998010017f3d2b89fcad52f76f
MD5 8cc0083f33fccb757ddf920531596465
BLAKE2b-256 3149e0562e7ccbab29dda3a3b02238f3a67cab6d7067d1cdebcc6512e8655d97

See more details on using hashes here.

File details

Details for the file tabmat-4.0.0a3-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-4.0.0a3-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 27019382716756a88703996067039370eeadf90fcded4d205223e500b62cb41d
MD5 e2307f7adf05d0328b415bd961545b9a
BLAKE2b-256 a7f9febfbcfe158e339146c987fb82b5d3d2662cc30f4f3a1abac91b35726429

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page