Basic Informatics and Gene Statistics from Unnormalized Reads, a feature selection and correlation package for scRNAseq

These details have not been verified by PyPI

Project links

Project description

BigSur

BigSur is a package for principled, robust single-cell transcriptomics normalization, feature selection and correlations calculation. This ReadMe file includes a quick summary of what BigSur can be used for, along with small code examples to get started.

What is BigSur?

Basic Informatics and Gene Statistics from Unnormalized Reads (BigSur) is an analytical model of single-cell transcriptomics (scRNA-seq) data. This model can be used to select features and calculate correlation, taking into account the biological and technical noise inherent in scRNA-seq.

The importance of feature selection, along with results showing BigSur performs equivalently to, if not better than, Seurat and scanpy feature selection, are shown in Dollinger et al. 2025.
The pitfalls of using Pearson's Correlation Coefficients (PCCs) to calculate correlations in scRNA-seq data and the corrections made to PCCs to account for the noise and sparsity in these data are shown in Silkwood et al. 2023.

Updates

03/10/25

Pip package has been updated with the correlation code.

10/15/25

The GitHub repository now includes the code to calculate correlations. See below for the quickstart. The tutorial for the correlations will be uploaded soon. The pip package does not currently include the correlations code.

Installation

The easiest way to install bigsur is via pip:

conda create -n bigsur_env python pip
conda activate bigsur_env
pip install bigsur

Alternatively, you can clone the GitHub repo. We've included a environment file for conda environment installation; the only package we require that isn't installed with scanpy is mpmath and numexpr. For example:

In terminal:

cd bigsur_dir #directory to clone to

git clone https://github.com/landerlabcode/BigSur.git

conda create -f environment.yml -n bigsur

A note about the virtual environment

This environment contains all packages that are required to reproduce any result of the paper. If you want a lightweight conda enviroment (or alternatively, if the environment file is causing issues), you can create a sufficient conda environment as follows:

In terminal:

conda create -n bigsur -c conda-forge scanpy mpmath numexpr ipykernel python-igraph leidenalg

Usage

Feature selection

Usage for feature selection is detailed in the example notebook.

TL;DR:

import sys

sys.path.append(bigsur_dir) # directory where git repo was cloned, not necessary if BigSur was installed using pip

from BigSur.feature_selection import mcfano_feature_selection as mcfano

Replace sc.pp.highly_variable_genes(adata) in your pipeline with mcfano(adata, layer='counts'), where the UMIs are in adata.layers['counts'].

And that's it! You can read more about how to use BigSur for feature selection, and in particular how to optimize cutoffs for a given dataset, in the example notebook.

Correlations

To calculate correlations on data contained within an adata, where the UMIs are stored in adata.layers['counts'], run the following commands:

import sys

sys.path.append(bigsur_dir) # directory where git repo was cloned

from BigSur.correlations import calculate_correlations

calculate_correlations(adata, layer = 'counts', cv = None, verbose = 2, write_out=write_out_folder, previously_run=False, store_intermediate_results=True)

By default, the function stores the mcPCCs and the BH-corrected $p$-values in adata.varm. Both these matrices are lower-triangular and sparse. Given the potential size of these files, we recommend saving the mcPCCs and BH-corrected $p$-values to disk, by specifying a folder to write to, using the write_out parameter. See the docstring for more details.

Since the correlations $p$-value calculation can take a long time to run and can require a lot of memory, we've included optional parameters to ensure that intermediate results are saved to disk if the application runs out of memory. The store_intermediate_results parameter tells the function whether to store intermediate results, such as cumulants or coefficients, in the write_out folder. The previously_run parameter tells the function to look in that folder for any intermediate results that were previously generated. If it is likely that the application will run out of memory, we suggest storing the intermediate results; however, some of these files are not sparse matrices and therefore can take a lot of storage space.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.8

Apr 8, 2026

This version

0.0.7

Mar 10, 2026

0.0.6

Jul 28, 2025

0.0.5

Jul 28, 2025

0.0.4

Jul 28, 2025

0.0.3

Jul 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigsur-0.0.7.tar.gz (22.2 kB view details)

Uploaded Mar 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bigsur-0.0.7-py3-none-any.whl (23.1 kB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file bigsur-0.0.7.tar.gz.

File metadata

Download URL: bigsur-0.0.7.tar.gz
Upload date: Mar 10, 2026
Size: 22.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for bigsur-0.0.7.tar.gz
Algorithm	Hash digest
SHA256	`b27e7c6f733bfabbd89f487e5d8a36005f6e4799a79454ad57cea58c6090848c`
MD5	`7dd4f14300067a1cdce0182b010301cc`
BLAKE2b-256	`98e5c12fd43c341565745d0ae6c8af1817ddcc9a7cdca9f9ce353f43ba8d5bea`

See more details on using hashes here.

File details

Details for the file bigsur-0.0.7-py3-none-any.whl.

File metadata

Download URL: bigsur-0.0.7-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 23.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for bigsur-0.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d3f2b47534b0edf87a7ace2ed3b606b4e54232e4fb95eba27632566e5e8eb18`
MD5	`8d50800018bede1730f4ab5cba1e65ed`
BLAKE2b-256	`c9c0457242571552958fcb5a45a68024e4b5164f3d60f24a9c46e13b9cba2c7b`

See more details on using hashes here.

bigsur 0.0.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

BigSur

What is BigSur?

Updates

03/10/25

10/15/25

Installation

A note about the virtual environment

Usage

Feature selection

Correlations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes