tensorzinb-plusplus

Zero Inflated Negative Binomial regression for scRNA-seq (maintained fork of tensorzinb by Tao Cui & Tingting Wang)

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

saarantras

These details have not been verified by PyPI

Project links

Original Repository

Project description

This is a fork of the original tensorzinb package, updated to be compatible with more recent versions of tensorflow to function on modern hardware & to function with sparse input (useful for extremely zero-inflated data).

Zero-inflated Negative Binomial Model using TensorFlow

TensorZINB is a Python module that uses TensorFlow to effectively solve negative binomial (NB) and zero-inflated negative binomial (ZINB) models. One of its key strengths is its ability to accurately calculate the NB/ZINB log likelihood. Additionally, it can be used for differentially expressed gene (DEG) analysis in the context of single-cell RNA sequencing (scRNA-seq). This package distinguishes itself by ensuring numerical stability, enabling the processing of datasets in batches, and delivering superior computing speeds compared to other existing NB/ZINB solvers. To guarantee the reliability of its analysis results, TensorZINB has undergone rigorous testing against various statistical packages. TensorZINB supports the execution of various features on both the negative binomial and zero-inflated (logit) components. Furthermore, it allows for the use of common features with the same weights across multiple subjects within a batch.

The negative binomial distribution is $$NB(y;\mu,\theta)=\frac{\Gamma(y+\theta)}{\Gamma(\theta)\Gamma(y+1)}\left( \frac{\theta}{\theta+\mu}\right)^\theta\left(\frac{\mu}{\theta+\mu}\right)^y$$ where $\mu$ is the mean and $\theta$ is the dispersion parameter. For zero-inflated models, the counts are modelled as a mixture of the Bernoulli distribution and count distribution, i.e.,

$$ Pr(Y=0)=\pi+(1-\pi)NB(0),\ Pr(Y=y)=(1-\pi)NB(y),y>0. $$

We use the following model parameterization

$$ \log \mu_g =X_{\mu}\beta_{g,\mu}+Z_{\mu}\alpha_{\mu}, logit \pi_g =X_{\pi}\beta_{g,\pi}+Z_{\pi}\alpha_{\pi}, \log \theta_g = \beta_{g,\theta}, $$

where $\mu_g$ is the mean of subject $g$, $X_{\mu}$, $Z_{\mu}$, $X_{\pi}$ and $Z_{\pi}$ are feature matrices, $\beta_{g,\mu}$ and $\beta_{g,\pi}$ are coefficients for each subject $g$, $\alpha_{\mu}$ and $\alpha_{\pi}$ are common coefficients shared across all subjects.

Installation

pip install tensorzinb-plusplus

Model initialization

TensorZINB(
    endog,                     # counts data: number of samples x number of subjects
    exog,                      # observed variables for the negative binomial part
    exog_c=None,               # common observed variables across all subjects for the nb part
    exog_infl=None,            # observed variables for the logit part
    exog_infl_c=None,          # common observed variables across all subjects for the logit part
    same_dispersion=False,     # whether all subjects use the same dispersion
    nb_only=False,             # whether negative binomial only without logit or zero-inflation part
)

Model fit

TensorZINB.fit(
    init_weights={},          # initial model weights. If empty, init_method is used to find init weights
    init_method="poi",        # initialization method: `poi` for Poisson and `nb` for negative binomial
    device_type="CPU",        # device_type: `CPU` or `GPU`
    device_name=None,         # None or one from `tf.config.list_logical_devices()`
    return_history=False,     # whether return loss and weights history during training
    epochs=5000,              # maximum number of epochs to run
    learning_rate=0.008,      # start learning rate
    num_epoch_skip=3,         # number of epochs to skip learning rate reduction
    is_early_stop=True,       # whether use early stop
    min_delta_early_stop=0.05,# minimum change in loss to qualify as an improvement
    patience_early_stop=50,   # number of epochs with no improvement after which training will be stopped
    factor_reduce_lr=0.8,     # factor by which the learning rate will be reduced
    patience_reduce_lr=10,    # number of epochs with no improvement after which learning rate will be reduced
    min_lr=0.001,             # lower bound on the learning rate
    reset_keras_session=False,# reset keras session at the beginning
)

Model results

{
    "llf_total":              # sum of log likelihood across all subjects
    "llfs":                   # an array contains log likelihood for each subject
    "aic_total":              # sum of AIC across all subjects
    "aics":                   # an array contains AIC for each subject
    "df_model_total":         # total degree of freedom of all subjects
    "df_model":               # degree of freedom for each subject
    "weights":                # model weights
    "cpu_time":               # total computing time for all subjects  
    "num_sample":             # number of samples
    "epochs":                 # number of epochs run
    "loss_history":           # loss history over epochs if return_history=True
    "weights_history":        # weights history over epochs if return_history=True
}

DEG Analysis

LRTest provides utility for scRNA-seq DEG analysis. It runs the likelihood ratio test (LRT) by computing the log likelihood difference with and without conditions being added to the model.

To construct a LRTest object, we use

LRTest(
    df_data,                 # count data frame. columns: subjects (genes), rows: samples
    df_feature,              # feature data frame. columns: features, rows: samples
    conditions,              # list of features to test DEG, e.g., diagnosis
    nb_features,             # list of features for the negative binomial model
    nb_features_c=None,      # list of common features for the negative binomial model
    infl_features=None,      # list of features for the zero inflated (logit) model
    infl_features_c=None,    # list of common features for the zero inflated (logit) model
    add_intercept=True,      # whether add intercept. False if df_feature already contains intercept
    nb_only=False,           # whether only do negative binomial without zero inflation
    same_dispersion=False,   # whether all subjects use the same dispersion
)

We then call LRTest.run to run the likelihood ratio test

LRTest.run(
    learning_rate=0.008,     # learning rate
    epochs=5000,             # number of epochs run
)

The LRTest.run returns a result dataframe dfr with columns:

[
    "ll0":                   # log likelihood without conditions
    "aic0":                  # AIC without conditions
    "df0":                   # degree of freedom without conditions
    "cpu_time0":             # computing time for each subject without conditions
    "ll1":                   # log likelihood without conditions
    "aic1":                  # AIC with conditions
    "df1":                   # degree of freedom with conditions
    "cpu_time1":             # computing time for each subject with conditions
    "lld":                   # ll1 - ll0
    "aicd":                  # aic1 - aic0
    "pvalue":                # p-value: 1 - stats.chi2.cdf(2 * lld, df1 - df0)
]

tensorzinb.utils provides utility functions:

normalize_features: normalize scRNA-seq features by removing the mean and scaling to unit variance.
correct_pvalues_for_multiple_testing: correct pvalues for multiple testing in Python, which is the same as p.adjust in R.

We can further correct pvalues for multiple testing by calling correct_pvalues_for_multiple_testing(dfr['pvalue']).

Example

An example code to show how to use TensorZINB and LRTest to perform DEG analysis can be found at examples/deg_example.ipynb. The example runs DEG analysis on a sample dataset with 17 clusters and 20 genes in each cluster.

Tests

In tests/tensorzinb.ipynb, we show several tests:

validate the Poisson weights initialization.
compare with statsmodels for negative binomial model only without zero-inflation to make sure the results match.
show statsmodels is not numerically stable for zero-inflated negative binomial. statsmodels can only return results when initialized with TensorZINB results. TensorZINB results match the true parameters used to generate the samples.

More tests can be found in https://github.com/wanglab-georgetown/countmodels/blob/main/tests/zinb_test.ipynb

Troubleshooting

Run on Apple silicon

To run tensorflow on Apple silicon (M1, M2, etc), install TensorFlow using the following:

conda install -c apple tensorflow-deps

python -m pip install tensorflow-macos==2.9.2

python -m pip install tensorflow-metal==0.5.1

Feature normalization

If the solver cannot return correct results, please ensure features in $X$ are normalized by using StandardScaler(). Please refer to the example in examples/deg_example.ipynb.

Sparse input note

The sparse-input path runs in TensorFlow eager mode to avoid legacy graph serialization limits with large SparseTensor inputs. Optional minibatch training for very large datasets is not implemented yet and remains future work.

Reference

Cui, T., Wang, T. A Comprehensive Assessment of Hurdle and Zero-inflated Models for Single Cell RNA-sequencing Analysis, Briefings in Bioinformatics, July 2023. https://doi.org/10.1093/bib/bbad272

Support and Contribution

If you encounter any bugs while using the code, please don't hesitate to create an issue on GitHub here.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

saarantras

These details have not been verified by PyPI

Project links

Original Repository

Release history Release notifications | RSS feed

This version

0.0.7

Apr 1, 2026

0.0.6

Mar 28, 2026

0.0.5

Mar 18, 2026

0.0.4

Mar 17, 2026

0.0.2

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tensorzinb_plusplus-0.0.7.tar.gz (11.0 MB view details)

Uploaded Apr 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tensorzinb_plusplus-0.0.7-py3-none-any.whl (15.5 kB view details)

Uploaded Apr 1, 2026 Python 3

File details

Details for the file tensorzinb_plusplus-0.0.7.tar.gz.

File metadata

Download URL: tensorzinb_plusplus-0.0.7.tar.gz
Upload date: Apr 1, 2026
Size: 11.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tensorzinb_plusplus-0.0.7.tar.gz
Algorithm	Hash digest
SHA256	`94da1423aa98178aea0a5992cb8c25c7c2f2364d8bf134203ef8f56c443be6c7`
MD5	`d364c41aa332dd0b5bc4c934d4849fc4`
BLAKE2b-256	`bd52d2fe7f7367e74d824c667330c5589ef167fcaf851e03abb4ca974bf67055`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tensorzinb_plusplus-0.0.7.tar.gz:

Publisher: publish.yml on saarantras/tensorzinb-plusplus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tensorzinb_plusplus-0.0.7.tar.gz
- Subject digest: 94da1423aa98178aea0a5992cb8c25c7c2f2364d8bf134203ef8f56c443be6c7
- Sigstore transparency entry: 1204015950
- Sigstore integration time: Apr 1, 2026
Source repository:
- Permalink: saarantras/tensorzinb-plusplus@463f05cef059c81fd4c449eccb154ad4bd41ec0e
- Branch / Tag: refs/tags/v0.0.7
- Owner: https://github.com/saarantras
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@463f05cef059c81fd4c449eccb154ad4bd41ec0e
- Trigger Event: push

File details

Details for the file tensorzinb_plusplus-0.0.7-py3-none-any.whl.

File metadata

Download URL: tensorzinb_plusplus-0.0.7-py3-none-any.whl
Upload date: Apr 1, 2026
Size: 15.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tensorzinb_plusplus-0.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`acbfaef2f5c79ea1696a9d71aa9d959f437aa648f5132e6cdb92455c51022ed9`
MD5	`0877784d804fbde36e6da708acf48ea2`
BLAKE2b-256	`b8b8c1da4cef3541a65ac50b7b3fd02ad6bb740d351b20f50c11fa5df1d237c3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for tensorzinb_plusplus-0.0.7-py3-none-any.whl:

Publisher: publish.yml on saarantras/tensorzinb-plusplus

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: tensorzinb_plusplus-0.0.7-py3-none-any.whl
- Subject digest: acbfaef2f5c79ea1696a9d71aa9d959f437aa648f5132e6cdb92455c51022ed9
- Sigstore transparency entry: 1204015952
- Sigstore integration time: Apr 1, 2026
Source repository:
- Permalink: saarantras/tensorzinb-plusplus@463f05cef059c81fd4c449eccb154ad4bd41ec0e
- Branch / Tag: refs/tags/v0.0.7
- Owner: https://github.com/saarantras
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@463f05cef059c81fd4c449eccb154ad4bd41ec0e
- Trigger Event: push

tensorzinb-plusplus 0.0.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Zero-inflated Negative Binomial Model using TensorFlow

Installation

Model initialization

Model fit

Model results

DEG Analysis

Example

Tests

Troubleshooting

Run on Apple silicon

Feature normalization

Sparse input note

Reference

Support and Contribution

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance