Skip to main content

Efficient matrix representations for working with tabular data.

Project description

Efficient matrix representations for working with tabular data

CI Docs Conda-forge PypiVersion PythonVersion

Installation

Simply install via conda-forge!

conda install -c conda-forge tabmat

Getting Started

The easiest way to start with tabmat is to use the convenience constructor tabmat.from_pandas.

import tabmat as tm
import numpy as np

dense_array = np.random.normal(size=(100, 1))

Use case

TL;DR: We provide matrix classes for efficiently building statistical algorithms with data that is partially dense, partially sparse and partially categorical.

Data used in economics, actuarial science, and many other fields is often tabular, containing rows and columns. Further common properties are also common:

  • It often is very sparse.
  • It often contains a mix of dense and sparse columns.
  • It often contains categorical data, processed into many columns of indicator values created by "one-hot encoding."

High-performance statistical applications often require fast computation of certain operations, such as

  • Computing sandwich products of the data, transpose(X) @ diag(d) @ X. A sandwich product shows up in the solution to weighted least squares, as well as in the Hessian of the likelihood in generalized linear models such as Poisson regression.
  • Matrix-vector products, possibly on only a subset of the rows or columns. For example, when limiting computation to an "active set" in a L1-penalized coordinate descent implementation, we may only need to compute a matrix-vector product on a small subset of the columns.
  • Computing all operations on standardized predictors which have mean zero and standard deviation one. This helps with numerical stability and optimizer efficiency in a wide range of machine learning algorithms.

This library and its design

We designed this library with the above use cases in mind. We built this library first for estimating generalized linear models, but expect it will be useful in a variety of econometric and statistical use cases. This library was borne out of our need for speed, and its unified API is motivated by the desire to work with a unified matrix API internal to our statistical algorithms.

Design principles:

  • Speed and memory efficiency are paramount.
  • You don't need to sacrifice functionality by using this library: DenseMatrix and SparseMatrix subclass np.ndarray and scipy.sparse.csc_matrix respectively, and inherit behavior from those classes wherever it is not improved on.
  • As much as possible, syntax follows NumPy syntax, and dimension-reducing operations (like sum) return NumPy arrays, following NumPy dimensions about the dimensions of results. The aim is to make these classes as close as possible to being drop-in replacements for numpy.ndarray. This is not always possible, however, due to the differing APIs of numpy.ndarray and scipy.sparse.
  • Other operations, such as toarray, mimic Scipy sparse syntax.
  • All matrix classes support matrix-vector products, sandwich products, and getcol.

Individual subclasses may support significantly more operations.

Matrix types

  • DenseMatrix represents dense matrices, subclassing numpy nparray. It additionally supports methods getcol, toarray, sandwich, standardize, and unstandardize.
  • SparseMatrix represents column-major sparse data, subclassing scipy.sparse.csc_matrix. It additionally supports methods sandwich and standardize.
  • CategoricalMatrix represents one-hot encoded categorical matrices. Because all the non-zeros in these matrices are ones and because each row has only one non-zero, the data can be represented and multiplied much more efficiently than a generic sparse matrix.
  • SplitMatrix represents matrices with both dense, sparse and categorical parts, allowing for a significant speedup in matrix multiplications.
  • StandardizedMatrix efficiently and sparsely represents a matrix that has had its column normalized to have mean zero and variance one. Even if the underlying matrix is sparse, such a normalized matrix will be dense. However, by storing the scaling and shifting factors separately, StandardizedMatrix retains the original matrix sparsity.

Wide data set

Benchmarks

See here for detailed benchmarking.

API documentation

See here for detailed API documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabmat-3.1.13.tar.gz (2.1 MB view details)

Uploaded Source

Built Distributions

tabmat-3.1.13-cp312-cp312-win_amd64.whl (649.2 kB view details)

Uploaded CPython 3.12 Windows x86-64

tabmat-3.1.13-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.7 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

tabmat-3.1.13-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.6 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

tabmat-3.1.13-cp312-cp312-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

tabmat-3.1.13-cp312-cp312-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

tabmat-3.1.13-cp311-cp311-win_amd64.whl (649.1 kB view details)

Uploaded CPython 3.11 Windows x86-64

tabmat-3.1.13-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.8 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

tabmat-3.1.13-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

tabmat-3.1.13-cp311-cp311-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

tabmat-3.1.13-cp311-cp311-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

tabmat-3.1.13-cp310-cp310-win_amd64.whl (647.9 kB view details)

Uploaded CPython 3.10 Windows x86-64

tabmat-3.1.13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

tabmat-3.1.13-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

tabmat-3.1.13-cp310-cp310-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

tabmat-3.1.13-cp310-cp310-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

tabmat-3.1.13-cp39-cp39-win_amd64.whl (649.0 kB view details)

Uploaded CPython 3.9 Windows x86-64

tabmat-3.1.13-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (6.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

tabmat-3.1.13-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (6.5 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

tabmat-3.1.13-cp39-cp39-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

tabmat-3.1.13-cp39-cp39-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

File details

Details for the file tabmat-3.1.13.tar.gz.

File metadata

  • Download URL: tabmat-3.1.13.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for tabmat-3.1.13.tar.gz
Algorithm Hash digest
SHA256 6d3d8bfc52e4c63f6248cc74e0958d6f0faed471e34725f7d756404855208d14
MD5 5ae299a918f7d1d51bd0213393080e01
BLAKE2b-256 b2c1015b7f89c15beb459485842105e09471681b5fbbb5f34799ce4a6745f016

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: tabmat-3.1.13-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 649.2 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for tabmat-3.1.13-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 0b4cd016f7809610985a0afb09caa3ec417973f18d6caa822609b4e56e50491d
MD5 8a6cc2353437aef99904a7687289fd7f
BLAKE2b-256 489b5092fc236f0a03da197202a20277bc6845bb56d5b954b1d14265b1fe6ebe

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 01bdd29a2a4fa8b47941089ecccf068bbd3232b3df050f3d34765a3751acfe57
MD5 12194d46af5003792d75dfbd32a093bc
BLAKE2b-256 add36c3b2312d85f6f399b182d81df235224170fa968c744d020d3f0ccc29092

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c58922cec5497fff9c20c4b948a0fb3899680d7704cd4a734f2c41f321cee250
MD5 22e1510c1bc18972c71ca4cf9fd86b3e
BLAKE2b-256 2c0d8af82340f1011d77e285d968df4bffc99e778415023688d4843c6cf83d8c

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6a810c3da8deec8d5ea42757d2e04821baeac8683ef1f2ae1b556eab9b1e7596
MD5 380e730663fa03093d2c58d816f8a509
BLAKE2b-256 3bfcb0c220e4ba4b460b4585df4dc34b0436f3d2f913e605a32eee1d20dda37c

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 0dcdf6b0b40ade9eaa7b33796d15a5cf08261bb584c668db49e80e5d44e56d38
MD5 197cad68e55c6ead6e6be4f62e6166a3
BLAKE2b-256 9f67e4d6dfdd0835d790632901b9c4c417b41f0f0ab0efbaaa56d2561b75a280

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: tabmat-3.1.13-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 649.1 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for tabmat-3.1.13-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 3a98edcddcfb7713f220ee75bd04883129f02d5a37c97900d52c66637552324f
MD5 63238438bc46b3cca7e0b61fbf168a46
BLAKE2b-256 f380f684badcd8abf43063a2256906de98c5e265917059412111fe5cdfb19e5c

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 bcd6a9d6d333a2e470b2e0ea4a8ee521801dc244ccdfccf47b67e28678ff65cc
MD5 2fba7c64048db1ac01965d8fcedb8687
BLAKE2b-256 c8d15411caa0d00b2294782cdf595062ef10569716e7da42c7ffb8f761690d2e

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b3eca3dd8cb39f2f58c161464e854f901390fb58cdce2cd7ebdcd1bfc5669260
MD5 0fc8feb479cf7fde586f41e52af49c5f
BLAKE2b-256 749f25e86629805cc8617d5225e6d7875830d344fb60a772a631eaa014671469

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6c62b56c8f79a8a3a78b6d6cd0e50ff09ed559cca5f8244d38e8b002223e45c1
MD5 c70a35a7f3d56e8365887e9cd8f1fc96
BLAKE2b-256 cb5d7992757c7099ebb18dac70041c19ef971ab9237760852e14e1b5a27499a3

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5fee225d506c7b697b02f2f1aa8af79f5a24e76bdea522d798466981dd3efdf1
MD5 03998a30bd2dcf2193955b2e2c0c7910
BLAKE2b-256 776fd79556388b06b8c5078e8e09617e22b981d4af5dafd842fbf818385693a1

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: tabmat-3.1.13-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 647.9 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for tabmat-3.1.13-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 eec343b09cbe8739837a75dd56099d24fae929f534d37c7b0c2311b608dc15f0
MD5 922fb72a0a31a0452990dc9d70b48608
BLAKE2b-256 175a4bc0a6461042c76b72c8d483e0ee7800ca9758b645c70e2d8532f821ddba

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ed9e44bab3d2e949d766090736d49fc12b66023b5e1c43383fc875bf6a210c9c
MD5 da4bf73067f480df541181e91df0c4e9
BLAKE2b-256 2a180001bef77602092fbf1ffac51912a9db2da90a38fb0841b90e67a07c9891

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1b7b327ad24883938cb3dfa6791a4d79b8ba26c226a3ce148cfd4cbd5e699fd4
MD5 b04273f2b7d32cb0558c927c152768f2
BLAKE2b-256 fa24225cb65175015ea629b97e427fe3610f0d733d560286d62c036b5b222dc1

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0256120feaf0b6e9fcac8f1102585276ea77271cba8a23454ed9ffdee3c18574
MD5 c5fa08fee77033637733eb378fff22e1
BLAKE2b-256 cdd02a2a78f909bd84523f6d37c1c975936f1435e91f41863c939bf3d42ad680

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 8b244dcd56aa9875b5ef1f8dcdbfe90ab69612be8e7c611b16effc3b35d6c707
MD5 cb77164c19dfac2925ffc4f2b99f139b
BLAKE2b-256 f4294be5b35709d8733020958559dd5d436bb3efeb6baf100c860db512b171bb

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: tabmat-3.1.13-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 649.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for tabmat-3.1.13-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 f753cf6e8803b4bbe250e9b1756ee8f7b7efc311fc7d27214cb627eb368e2944
MD5 7066351b2424ebe5c37772c20fe48d4c
BLAKE2b-256 8e7e78dad4ac24a0bde7640c1f8f66208dcdc6c07c8ffc9b4bf006ceb27ae810

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 da0cb71bf956773f3be15d07082f2ab63ae375a7b9599f76e003f5998712e11c
MD5 0e3a30a4ec45aafade3873f6c9f2bc4d
BLAKE2b-256 ebd82fb0c7db210f58f47f204836f0d3d262ac05f908269b7508c8a69b7d7a02

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 da3ef2372ca79317693fff909bb1849d82b81b77438ee6d371877f048465489f
MD5 aeb3502521bc3e52d8c110e148c35962
BLAKE2b-256 ff5a60af87fb19a439185d3af068b52ff064641fe74253b5c29b176f27c6145a

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3a819f1886b92b2ead829c233d06ad363ca94f8accf2908cabc338d90069a62e
MD5 7525dd43b24d4da3247cb178f6dac662
BLAKE2b-256 7f31aca85126f602fd64390c9a16e1f02879bda13d352de31be16a2d61f98ee4

See more details on using hashes here.

File details

Details for the file tabmat-3.1.13-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.13-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 f3a49bd92d86f525d5655a4376d25dd17200d536f8b9c7b6eace3ce0f98af8da
MD5 7c461e1318d52c63dac567d83f19c8e0
BLAKE2b-256 3ac23ec4c9c565b8d0f74727015e339cc109d9d902886c46dac74af351f60de4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page