Skip to main content

Efficient matrix representations for working with tabular data.

Project description

Efficient matrix representations for working with tabular data

CI Docs Conda-forge PypiVersion PythonVersion

Installation

Simply install via conda-forge!

conda install -c conda-forge tabmat

Getting Started

The easiest way to start with tabmat is to use the convenience constructor tabmat.from_pandas.

import tabmat as tm
import numpy as np

dense_array = np.random.normal(size=(100, 1))

Use case

TL;DR: We provide matrix classes for efficiently building statistical algorithms with data that is partially dense, partially sparse and partially categorical.

Data used in economics, actuarial science, and many other fields is often tabular, containing rows and columns. Further common properties are also common:

  • It often is very sparse.
  • It often contains a mix of dense and sparse columns.
  • It often contains categorical data, processed into many columns of indicator values created by "one-hot encoding."

High-performance statistical applications often require fast computation of certain operations, such as

  • Computing sandwich products of the data, transpose(X) @ diag(d) @ X. A sandwich product shows up in the solution to weighted least squares, as well as in the Hessian of the likelihood in generalized linear models such as Poisson regression.
  • Matrix-vector products, possibly on only a subset of the rows or columns. For example, when limiting computation to an "active set" in a L1-penalized coordinate descent implementation, we may only need to compute a matrix-vector product on a small subset of the columns.
  • Computing all operations on standardized predictors which have mean zero and standard deviation one. This helps with numerical stability and optimizer efficiency in a wide range of machine learning algorithms.

This library and its design

We designed this library with the above use cases in mind. We built this library first for estimating generalized linear models, but expect it will be useful in a variety of econometric and statistical use cases. This library was borne out of our need for speed, and its unified API is motivated by the desire to work with a unified matrix API internal to our statistical algorithms.

Design principles:

  • Speed and memory efficiency are paramount.
  • You don't need to sacrifice functionality by using this library: DenseMatrix and SparseMatrix subclass np.ndarray and scipy.sparse.csc_matrix respectively, and inherit behavior from those classes wherever it is not improved on.
  • As much as possible, syntax follows NumPy syntax, and dimension-reducing operations (like sum) return NumPy arrays, following NumPy dimensions about the dimensions of results. The aim is to make these classes as close as possible to being drop-in replacements for numpy.ndarray. This is not always possible, however, due to the differing APIs of numpy.ndarray and scipy.sparse.
  • Other operations, such as toarray, mimic Scipy sparse syntax.
  • All matrix classes support matrix-vector products, sandwich products, and getcol.

Individual subclasses may support significantly more operations.

Matrix types

  • DenseMatrix represents dense matrices, subclassing numpy nparray. It additionally supports methods getcol, toarray, sandwich, standardize, and unstandardize.
  • SparseMatrix represents column-major sparse data, subclassing scipy.sparse.csc_matrix. It additionally supports methods sandwich and standardize.
  • CategoricalMatrix represents one-hot encoded categorical matrices. Because all the non-zeros in these matrices are ones and because each row has only one non-zero, the data can be represented and multiplied much more efficiently than a generic sparse matrix.
  • SplitMatrix represents matrices with both dense, sparse and categorical parts, allowing for a significant speedup in matrix multiplications.
  • StandardizedMatrix efficiently and sparsely represents a matrix that has had its column normalized to have mean zero and variance one. Even if the underlying matrix is sparse, such a normalized matrix will be dense. However, by storing the scaling and shifting factors separately, StandardizedMatrix retains the original matrix sparsity.

Wide data set

Benchmarks

See here for detailed benchmarking.

API documentation

See here for detailed API documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabmat-3.1.14.tar.gz (2.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tabmat-3.1.14-cp312-cp312-win_amd64.whl (642.7 kB view details)

Uploaded CPython 3.12Windows x86-64

tabmat-3.1.14-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.2 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

tabmat-3.1.14-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

tabmat-3.1.14-cp312-cp312-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

tabmat-3.1.14-cp312-cp312-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.12macOS 10.9+ x86-64

tabmat-3.1.14-cp311-cp311-win_amd64.whl (645.0 kB view details)

Uploaded CPython 3.11Windows x86-64

tabmat-3.1.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

tabmat-3.1.14-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

tabmat-3.1.14-cp311-cp311-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

tabmat-3.1.14-cp311-cp311-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

tabmat-3.1.14-cp310-cp310-win_amd64.whl (644.6 kB view details)

Uploaded CPython 3.10Windows x86-64

tabmat-3.1.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

tabmat-3.1.14-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

tabmat-3.1.14-cp310-cp310-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

tabmat-3.1.14-cp310-cp310-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

tabmat-3.1.14-cp39-cp39-win_amd64.whl (645.6 kB view details)

Uploaded CPython 3.9Windows x86-64

tabmat-3.1.14-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.1 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

tabmat-3.1.14-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

tabmat-3.1.14-cp39-cp39-macosx_11_0_arm64.whl (1.6 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

tabmat-3.1.14-cp39-cp39-macosx_10_9_x86_64.whl (1.8 MB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

File details

Details for the file tabmat-3.1.14.tar.gz.

File metadata

  • Download URL: tabmat-3.1.14.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for tabmat-3.1.14.tar.gz
Algorithm Hash digest
SHA256 cab6ccbb5d77e25b47d30a5af2ff95ab56197227ddbf55b707d5e2f00193f3f9
MD5 3219ad5df4b0ea1e30bfddd8d12c8b9d
BLAKE2b-256 0b40ca5720785ab5c267c9407716d12d146b21c3c7d2d805acea3e8b513540a3

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: tabmat-3.1.14-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 642.7 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for tabmat-3.1.14-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 e41d77a124b539d60a7f33789a3a252dcea141dba24f844cdaf767fc855f8382
MD5 89f3e4d09079f53698fd9ebeda4dc9c4
BLAKE2b-256 a8b95501a97066f77b2e5769414d303349023d8f9fd86c3cbb3434b433399493

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 695b6cea6565dab4fa8988809a9ecf25849b259084063b320435e20cffc50f65
MD5 b3cdc890b90f1be1cd62d78bea16ae44
BLAKE2b-256 b624b2de8b46be1452df3da5a0f790a3fe7f4103365140e2f9316f040d660dfc

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 cc708ba024cfdb12805ee80249d49d1a979e077eea59fdfb9fa49f6c050c1938
MD5 9c8de5586dd667712196c9228cc88b75
BLAKE2b-256 2001bcf3c20c6bf1b06e758047c9d536b0ea4b5afeb2956bcba7ad14e8f57c18

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0cca6cf1756b619ba19e416f1f14b2266928aed885b9e525878e89d46917c77a
MD5 8a6512001ff13f70a26171fc6c05dca0
BLAKE2b-256 b989a5666bf97cbc9b5e2ef79f77163ddcbee4b87caa842babc4185116d45f18

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c5dc0d83aacd8931c65c1b0dbeda1b1af6c2d2c063568f030f088baf6a3c538d
MD5 fa35de684de90363c546741629309eb6
BLAKE2b-256 fc415811a9750438cd86c9f2638f201828c729296f4bbc4795d3f4b8685885dc

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: tabmat-3.1.14-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 645.0 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for tabmat-3.1.14-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 72d06f34c3e9f824e78d86555dfed6f46580ed906c64ff39f858619bc5eb62e9
MD5 f244c8cdbf51177894782f0846619200
BLAKE2b-256 73ed2a9111d178077c901e8ee466916cbf83c1adfa1d130341337f4f2b8fe9a1

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 56ad472cacd1b8e9c6a619b90c0b983187af1153d0718d5b599f97becee0365e
MD5 c296fe8994a17d8431ef472b009726eb
BLAKE2b-256 be3763229fbbda250587e5fd94755fa719139b85fe12250e9f022075b631474a

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3750bbe5d6c54b533776770c38717ad3e88858adbfbe105874033c5f732f56be
MD5 8b938745aed7ebb3d314b5766e2bbda2
BLAKE2b-256 1f804ba5867e381f34de49180578a939e80a3caf8d82d15744553fb4166cee4e

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a83576ee84d7806f2844feb01c86225c38702dec20d41a8bb1bfa9ceb723af01
MD5 8f1250e69f66977986169cd2e0bca290
BLAKE2b-256 0b483593b7e1d09f8dab3f4db2c0bb4b7a8de8c0f3d154ebe755ae6b25300ae3

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 7ac7a82ffc457568ec931e512ca1528a12f5b21a7ce7cb749455837ebb0e3bf0
MD5 0972f4fae26a60c56706582d922319aa
BLAKE2b-256 251c4bbe5dc5065cb54c476013d9750636dc7f1199006104e3b84e9708e77648

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: tabmat-3.1.14-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 644.6 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for tabmat-3.1.14-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 6b04b806d45cedcd479d110778e0f15d7c3f47e91c7ccc4f2e7a15b3a9dfcfdd
MD5 0b396c95f900b69476bec312b958c49b
BLAKE2b-256 3bfd76e4fe413e1b16a5ea8b8323c9612b7e1f1c36f7c16aa49637de0e5b5f8e

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 912710b6e9f527a10d2eb2923b0dfbb1ac1f9be7e03178fb3352c89d8ac39c2c
MD5 af51d0348547035c57d6486bda9da405
BLAKE2b-256 02fe80f08389c48e46283bf779759b225ce4df2e27d8a812c2f74421b3406bbf

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e79e19f1c7a1a534e751e54e9af58ed4103bfa3dfd04c959abe8b420116fef72
MD5 8d8aed95ca24a8592c6ea6f74f39ab85
BLAKE2b-256 c8c989802f810cc483856ff6b80277a98881ff1711e08308ad9113d724125d18

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4fc5886ffbe5266f68c73767c47f3f34834b742f5709ec80e2d708be0af880d5
MD5 4f0d177932324836cbc3b60f815485c9
BLAKE2b-256 d8af38006b3d5a4b73c65dbd8ce70cdc3d54e0ca434a0da027739817591f5780

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 f77bc2603cf13d52d9de136f0a1cd93a994c2d00ece31e23669b6214b2b5906a
MD5 358d3de15e1a14ec2388b8124e6fc43f
BLAKE2b-256 0c0e88db9e64cad466b378302ab4504d66af338c8052217a6d3cef420ae5a37d

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: tabmat-3.1.14-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 645.6 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for tabmat-3.1.14-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 9d2c463a39a8882492ee0a15d5bb56dc537e75d50698ade9f4ce3e6e613f2268
MD5 e6739d1e103728cd092c8ea557b1100b
BLAKE2b-256 123006edea5dc612d63cc626637251b51d79b0378858ff2a485ae5b38e36da3d

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f7cc735610508d669eb2fcb815f3a645ac1799ba47f674d256791114bba9c0a4
MD5 60af1f95c6281080ca0ba15fb5a91d06
BLAKE2b-256 df4e38ec3f542d416b79e3750550ed510d7402efcf64a28d296be35849eec50d

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ff286229ddd137fcff58ea4f99d20528ab9b45edf885101ffda53406ba8bc07c
MD5 5d72216515eb417344a5d0e5693fef6d
BLAKE2b-256 113e9bbdb74628d69525a78b6db178f361eb14af6835cdc8444133d14d4ae1b1

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 18c083e78bbee62cc14f9cee6f383d2f792b2863e38f92d8cfcc7cc261b613c5
MD5 82aecb115c4f09853dd8f763ef049bdc
BLAKE2b-256 6e296f87e5aab9748fd883ef1b6d089bd0d9fb58482add5e5a1b615ee45c5963

See more details on using hashes here.

File details

Details for the file tabmat-3.1.14-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for tabmat-3.1.14-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5630f53f9ae53896906816d8d1dc27035fdb9e20016b72d753e83642d4fe86e0
MD5 6411a4eff3db6120b4e5833b18f83b1b
BLAKE2b-256 39bb342ae0e7e7cc03274751bf95c682d49d3c2150ff280b274be2ffa34e1099

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page