Skip to main content

Single-cell RNA Sequencing Analysis

Project description

Metacells 0.9.5 - Single-cell RNA Sequencing Analysis

Documentation Status

The metacells package implements the improved metacell algorithm [1] for single-cell RNA sequencing (scRNA-seq) data analysis within the scipy framework, and projection algorithm based on it [2]. The original metacell algorithm [3] was implemented in R. The python package contains various algorithmic improvements and is scalable for larger data sets (millions of cells).

Metacell Analysis

Naively, scRNA_seq data is a set of cell profiles, where for each one, for each gene, we get a count of the mRNA molecules that existed in the cell for that gene. This serves as an indicator of how “expressed” or “active” the gene is.

As in any real world technology, the raw data may suffer from technical artifacts (counting the molecules of two cells in one profile, counting the molecules from a ruptured cells, counting only the molecules from the cell nucleus, etc.). This requires pruning the raw data to exclude such artifacts.

The current technology scRNA-seq data is also very sparse (typically <<10% the RNA molecules are counted). This introduces large sampling variance on top of the original signal, which itself contains significant inherent biological noise.

Analyzing scRNA-seq data therefore requires processing the profiles in bulk. Classically, this has been done by directly clustering the cells using various methods.

In contrast, the metacell approach groups together profiles of the “same” biological state into groups of cells of the “same” biological state, with the minimal number of profiles needed for computing robust statistics (in particular, mean gene expression). Each such group is a single “metacell”.

By summing profiles of cells of the “same” state together, each metacell greatly reduces the sampling variance, and provides a more robust estimation of the transcription state. Note a metacell is not a cell type (multiple metacells may belong to the same “type”, or even have the “same” state, if the data sufficiently over-samples this state). Also, a metacell is not a parametric model of the cell state. It is merely a more robust description of some cell state.

The metacells should therefore be further analyzed as if they were cells, using additional methods to classify cell types, detect cell trajectories and/or lineage, build parametric models for cell behavior, etc. Using metacells as input for such analysis techniques should benefit both from the more robust, less noisy input; and also from the (~100-fold) reduction in the number of cells to analyze when dealing with large data (e.g. analyzing millions of individual cells).

A common use case is taking a new data set and using an existing atlas with annotations (in particular, “type” annotations) to provide initial annotations for the new data set. As of version 0.9 this capability is provided by this package.

Metacell projection provides a quantitative “projected” genes profile for each query metacell in the atlas, together with a “corrected” one for the same subset of genes shared between the query and the atlas. Actual correction is optional, to be used only if there are technological differences between the data sets, e.g. 10X v2 vs. 10X v3. This allows performing a quantitative comparison between the projected and corrected gene expression profiles for determining whether the query metacell is a novel state that does not exist in the atlas, or, if it does match an atlas state, analyze any differences it may still have. This serves both for quality control and for quantitative analysis of perturbed systems (e.g. knockouts or disease models) in comparison to a baseline atlas.

Terminology and Results Format

NOTE: Version 0.9 breaks compatibility with version 0.8 when it comes to some APIs and the names and semantics of the result annotations. See below for the description of updated results (and how they differ from version 0.8). The new format is meant to improve the usability of the system in downstream analysis pipelines. For convenience we also list here the results of the new projection pipeline added in version 0.9.*. Versions 0.9.1 and 0.9.2 contain some bug fixes. Version 0.9.3 allows specifying target UMIs for the metacells, in addition to the target size in cells, and adaptively tries to satisfy both. This should produce better-sized metacells “out of the box” compared to the 0.9.[0-2] versions. The latest published version, 0.9.4, contains minor bug fixes and updates for newer versions of dependency packages.

If you have existing metacell data that was computed using version 0.8 (the current published version you will get from using pip install metacells, you can use the provided conversion script script to migrate your data to the format described below, while preserving any additional annotations you may have created for your data (e.g. metacells type annotations). The script will not modify your existing data files, so you can examine the results and tweak them if necessary.

In an upcoming version we will migrate from using AnnData to using daf to represent the data (h5ad files will still be supported, either directly through an adapter or via a conversion process). This will again unavoidingly break API compatibility, but will provide many advantages over the restricted AnnData APIs.

We apologize for the inconvenience.

Metacells Computation

In theory, the only inputs required for metacell analysis are cell gene profiles with a UMIs count per gene per cell. In practice, a key part of the analysis is specifying lists of genes for special treatment. We use the following terminology for these lists:

excluded_gene, excluded_cell masks

Excluded genes (and/or cells) are totally ignored by the algorithm (e.g. mytochondrial genes, cells with too few UMIs).

Deciding on the “right” list of excluded genes (and cells) is crucial for creating high-quality metacells. We rely on the analyst to provide this list based on prior biological knowledge. To support this supervised task, we provide the excluded_genes and exclude_cells functions which implement “reasonable” strategies for detecting some (not all) of the genes and cells to exclude. For example, these will exclude any genes found by find_bursty_lonely_genes, (called find_noisy_lonely_genes in v0.8). Additional considerations might be to use relate_genes to (manually) exclude genes that are highly correlated with known-to-need-to-be-excluded genes, or exclude any cells that are marked as doublets, etc.

Currently the 1st step of the processing must be to create a “clean” data set which lacks the excluded genes and cells (e.g. using extract_clean_data). When we switch to daf we’ll just stay with the original data set and apply the exclusion masks to the rest of the algorithm.

lateral_gene mask

Lateral genes are forbidden from being selected for computing cells similarity (e.g., cell cycle genes). In version 0.8 these were called “forbidden” genes. Lateral genes are still counted towards the total UMIs count when computing gene expression levels for cells similarity. In addition, lateral genes are still used to compute deviant (outlier) cells. That is, each computed metacell should still have a consistent gene expression level even for lateral genes.

The motivation is that we don’t want the algorithm to even try to create metacells based on these genes. Since these genes may be very strong (again, cell cycle), they would overcome the cell-type genes we are interested in, resulting in for example an “M-state” metacell which combines cells from several (similar) cell types.

Deciding on the “right” list of lateral genes is crucial for creating high-quality metacells. We rely on the analyst to provide this list based on prior biological knowledge. To support this supervised task, we provide the relate_genes pipeline for identifying genes closely related to known lateral genes, so they can be added to the list.

noisy_gene mask

Noisy genes are given more freedom when computing deviant (outlier) cells. That is, we don’t expect the expression level of such genes in the cells in the same metacell to be as consistent as we do for regular (non-noisy) genes. Note this isn’t related to the question of whether the gene is lateral of not. That is, a gee maybe lateral, noisy, both, or neither.

The motivation is that some genes are inherently bursty and therefore cause many cells which are otherwise a good match for their metacell to be marked as deviant (outliers). An indication for this is by examining the deviant_fold matrix (see below).

Deciding on the “right” list of noisy genes is again crucial for creating high-quality metacells (and minimizing the fraction of outlier cells). Again we rely on the analyst here,

Having determined the inputs and possibly tweaking the hyper-parameters (a favorite one is the target_metacell_size, which by default is 160K UMIs; this may be reduced for small data sets and may be increased for larger data sets), one typically runs divide_and_conquer_pipeline to obtain the following:

metacell (index) vs. metacell_name (string) per cell

The result of computing metacells for a set of cells with the above assigns each cell a metacell index. We also give each metacell a name of the format M<index>.<checksum> where the checksum reflects the cells grouped into the metacell. This protects the analyst from mistakenly applying metadata assigned to metacells from an old computation to different newly computed metacells.

We provide functions (convey_obs_to_group, convey_group_to_obs) for conveying between per-cell and per-metacell annotations, which all currently use the metacell integer indices (this will change when we switch to daf). The metacell string names are safer to use, especially when slicing the data.

dissolve cells mask

Whether the cell was in a candidate matecall that was dissolved due to being too small (too few cells and/or total UMIs). This may aid quality control when there are a large number of outliers; lowering the target_metacell_size may help avoid this.

selected_gene mask

Whether each gene was ever selected to be used to compute the similarity between cells to compute the metacells. When using the divide-and-conquer algorithm, this mask is different for each pile (especially in the second phase when piles are homogeneous). This mask is the union of all the masks used in all the piles. It is useful for ensuring no should-be-lateral genes were selected as this would reduce the quality of the metacells. If such genes exist, add them to the lateral_gene mask and recompute the metacells.

Having computed the metacells, the next step is to run collect_metacells to create a new AnnData object for them (when using daf, they will be created in the same dataset for easier analysis), which will contain all the per-gene metadata, and also:

X per gene per metacell

Once the metacells have been computed (typically using divide_and_conquer_pipeline), we can collect the gene expression levels profile for each one. The main motivation for computing metacells is that they allow for a robust estimation of the gene expression level, and therefore we by default compute a matrix of gene fractions (which sum to 1) in each metacell, rather than providing a UMIs count for each. This simplifies the further analysis of the computed metacells (this is known as e_gc in the old R metacells package).

Note that the expression level of noisy genes is less reliable, as we do not guarantee the cells in each metacell have a consistent expression level for such genes. Our estimator therefore uses a normal weighted mean for most genes and a normalized geometric mean for the noisy gene. Since the sizes of the cells collected into the same metacell may vary, our estimator also ensures one large cell doesn’t dominate the results. That is, the computed fractions are not simply “sum of the gene UMIs in all cells divided by the sum of all gene UMIs in all cells”.

grouped per metacell

The number of cells grouped into each metacell.

total_umis per metacell, and per gene per metacell

We still provide the total UMIs count for each each gene for each cell in each metacell, and the total UMIs in each metacell. Note that the estimated fraction of each gene in the metacell is not its total UMIs divided by the metacell’s total UMIs; the actual estimator is more complex.

The total UMIs are important to ensure that analysis is meaningful. For example, comparing expression levels of lowly-expressed genes in two metacells will yield wildly inaccurate results unless a sufficient number of UMIs were used (the sum of UMIs of the gene in both compared metacells). The functions provided here for computing fold factors (log base 2 of the ratio) and related comparisons automatically ignore cases when this sum is below some threshold (40) by considering the effective fold factor to be 0 (that is, “no difference”).

metacells_level per cell or metacell

This is 0 for rare gene module metacells, 1 for metacells computed from the main piles in the 2nd divide-and-conquer phase and 2 for metacells computed for their outliers.

If using divide_and_conquer_pipeline, the following are also computed (but not by the simple compute_divide_and_conquer_metacells:

rare_gene_module_<N> mask (for N = 0, …)

A mask of the genes combined into each of the detected “rare gene modules”. This is done in (expensive) pre-processing before the full divide-and-conquer algorithm to increase the sensitivity of the method, by creating metacells computed only from cells that express each rare gene module.

rare_gene mask

A mask of all the genes in all the rare gene modules, for convenience.

rare_gene_module per cell or metacell

The index of the rare gene module each cell or metacell expresses (or negative for the common case it expresses none of them).

rare_cell, rare_metacell masks

A mask of all the cells or metacells expressing any of the rare gene modules, for convenience.

In theory one is free to go use the metacells for further analysis, but it is prudent to perform quality control first. One obvious measure is the number of outlier cells (with a negative metacell index and a metacell name of Outliers). In addition, one should compute and look at the following (an easy way to compute all of them at once is to call compute_for_mcview, this will change in the future):

most_similar, most_similar_name per cell (computed by compute_outliers_most_similar)

For each outlier cell (whose metacell index is -1 and metacell name is Outliers), the index and name of the metacell which is the “most similar” to the cell (has highest correlation).

deviant_fold per gene per cell (computed by compute_deviant_folds)

For each cell, for each gene, the deviant_fold holds the fold factor (log base 2) between the expression level of the gene in the cell and the metacell it belongs to (or the most similar metacell for outlier cells). This uses the same (strong) normalization factor we use when computing deviant (outlier) cells, so for outliers, you should see some (non-excluded, non-noisy) genes with a fold factor above 3 (8x), or some (non-excluded, noisy) genes with a fold factor above 5 (32x), which justify why we haven’t merged that cell into a metacell; for cells grouped into metacells, you shouldn’t see (many) such genes. If there is a large number of outlier cells and a few non-noisy genes have a high fold factor for many of them, you should consider marking these genes as noisy and recomputing the metacells. If they are already marked as noisy, you may want to completely exclude them.

inner_fold per gene per metacell (computed by compute_inner_folds)

For each metacell, for each gene, the inner_fold is the strongest (highest absolute value) deviant_fold of any of the cells contained in the metacell. Both this and the inner_stdev_log below can be used for quality control over the consistency of the gene expression in the metacell.

significant_inner_folds_count per gene

For each gene, the number of metacells in which there’s at least one cell with a high deviant_fold (that is, where the inner_fold is high). This helps in identifying troublesome genes, which can be then marked as noisy, lateral or even excluded, depending on their biological significance.

inner_stdev_log per gene per metacell (computed by compute_inner_stdev_logs)

For each metacell, for each gene, the standard deviation of the log (base 2) of the fraction of the gene across the cells of the metacell. Ideally, the standard deviation should be ~1/3rd of the deviants_min_gene_fold_factor (which is 3 by default), indicating that (all)most cells are within that maximal fold factor. In practice we may see higher values - the lower, the better. Both this and the inner_fold above can be used for quality control over the consistency of the gene expression in the metacell.

marker_gene mask (computed by find_metacells_marker_genes)

Given the computed metacells, we can identify genes that have a sufficient number of effective UMIs (in some metacells) and also have a wide range of expressions (between different metacells). These genes serve as markers for identifying the “type” of the metacell (or, more generally, the “gene programs” that are active in each metacell).

Typically analysis groups the marker genes into “gene modules” (or, more generally, “gene programs”), and then use the notion of “type X expresses the gene module/programs Y, Z, …”. As of version 0.9, collecting such gene modules (or programs) is left to the analyst with little or no direct support in this package, other than providing the rare gene modules (which by definition would apply only to a small subset of the metacells).

x, y per metacell (computed by compute_umap_by_markers)

A common and generally effective way to visualize the computed metacells is to project them to a 2D view. Currently we do this by giving UMAP a distance metric between metacells based on a logistic function based on the expression levels of the marker genes. In version 0.8 this was based on picking (some of) the selected genes.

This view is good for quality control. If it forces “unrelated” cell types together, this might mean that more genes should be made lateral, or noisy, or even excluded; or maybe the data contains a metacell of doublets; or metacells mixing cells from different types, if too many genes were marked as lateral or noisy, or excluded. It takes a surprising small number of such doublet/mixture metacells to mess up the UMAP projection.

Also, one shouldn’t read too much from the 2D layout, as by definition it can’t express the “true” structure of the data. Looking at specific gene-gene plots gives much more robust insight into the actual differences between the metacell types, identify doublets, etc.

obs_outgoing_weights per metacell per metacell (also computed by compute_umap_by_markers)

The (sparse) matrix of weights of the graph used to generate the x and y 2D projection. This graph is very sparse, that is, has a very low degree for the nodes. It is meant to be used only in conjunction with the 2D coordinates for visualization, and should not be used by any downstream analysis to determine which metacells are “near” each other for any other purpose.

Metacells Projection

For the use case of projecting metacells we use the following terminology:

atlas

A set of metacells with associated metadata, most importantly a type annotation per metacell. In addition, the atlas may provide an essential_gene_of_<type> mask for each type. For a query metacell to successfully project to a given type will require that the query’s expression of the type’s essential genes matches the atlas. We also use the metadata listed above (specifically, lateral_gene, noisy_gene and marker_gene).

query

A set of metacells with minimal associated metadata, specifically without a type. This may optionally contain its own lateral_gene, noisy_gene and/or even marker_gene annotations.

ignored_gene mask, ignored_gene_of_<type> mask

A set of genes to not even try to match between the query and the atlas. In general the projection matches only a subset of the genes (that are common to the atlas and the query). However, the analyst has the option to force additional genes to be ignored, either in general or only when projecting metacells of a specific type. Manually ignoring specific genes which are known not to match (e.g., due to the query being some experiment, e.g. a knockout or a disease model) can improve the quality of the projection for the genes which do match.

Given these two input data sets, the projection_pipeline computes the following (inside the query AnnData object):

atlas_gene mask

A mask of the query genes that also exist in the atlas. We match genes by their name; if projecting query data from a different technology, we expect the caller to modify the query gene names to match the atlas before projecting it.

atlas_lateral_gene, atlas_noisy_gene, atlas_marker_gene, essential_gene_of_<type> masks

These masks are copied from the atlas to the query (restricting them to the common atlas_gene subset).

projected_noisy_gene

The mask of the genes that were considered “noisy” when computing the projection. By default this is the union of the noisy atlas and query genes.

corrected_fraction per gene per query metacell

For each atlas_gene, its fraction in each query metacell, out of only the atlas genes. This may be further corrected (see below) if projecting between different scRNA-seq technologies (e.g. 10X v2 and 10X v3). For non-atlas_gene this is 0.

projected_fraction per gene per query metacell

For each atlas_gene, its fraction in its projection on the atlas. This projection is computed as a weighted average of some atlas metacells (see below), which are all sufficiently close to each other (in terms of gene expression), so averaging them is reasonable to capture the fact the query metacell may be along some position on some gradient that isn’t an exact match for any specific atlas metacell. For non-atlas_gene this is 0.

total_atlas_umis per query metacell

The total UMIs of the atlas_gene in each query metacell. This is used in the analysis as described for total_umis above, that is, to ensure comparing expression levels will ignore cases where the total number of UMIs of both compared gene profiles is too low to make a reliable determination. In such cases we take the fold factor to be 0.

weights per query metacell per atlas metacsll

The weights used to compute the projected_fractions. Due to AnnData limitations this is returned as a separate object, but in daf we should be able to store this directly into the query object.

In theory, this would be enough for looking at the query metacells and comparing them to the atlas, and to project metadata from the atlas to the query (e.g., the metacell type) using convey_atlas_to_query. In practice, there is significant amount of quality control one needs to apply before accepting these results, which we compute as follows:

correction_factor per gene

If projecting a query on an atlas with different technologies (e.g., 10X v3 to 10X v2), an automatically computed factor we multiplied the query gene fractions by to compensate for the systematic difference between the technologies (1.0 for uncorrected genes and 0.0 for non-atlas_gene).

projected_type per query metacell

For each query metacell, the best atlas type we can assign to it based on its projection. Note this does not indicate that the query metacell is “truly” of this type; to make this determination one needs to look at the quality control data below.

projected_secondary_type per query metacell

In some cases, a query metacell may fail to project well to a single region of the atlas, but does project well to a combination of two distinct atlas regions. This may be due to the query metacell containing doublets, of a mixture of cells which match different atlas regions (e.g. due to sparsity of data in the query data set). Either way, if this happens, we place here the type that best describes the secondary region the query metacell was projected to; otherwise this would be the empty string. Note that the weights matrix above does not distinguish between the regions.

fitted_gene_of_<type> mask

For each type, the genes that were projected well from the query to the atlas for most cells of that type; any atlas_gene outside this mask failed to project well from the query to the atlas for most metacells of this type. For non-atlas_gene this is set to False.

Whether failing to project well some of the atlas_gene for most metacells of some projected_type indicates that they aren’t “truly” of that type is a decision which only the analyst can make based, on prior biological knowledge of the relevant genes.

fitted mask per gene per query metacell

For each atlas_gene for each query metacell, whether the gene was expected to be projected well, based on the query metacell projected_type (and the projected_secondary_type, if any). For non-atlas_gene this is set to False. This does not guarantee the gene was actually projected well.

misfit mask per gene per query metacell

For each atlas_gene for each query metacell, whether the corrected_fraction of the gene was significantly different from the projected_fractions (that is, whether the gene was not projected well for this metacell). For non-atlas_gene this is set to False, to make it easier to identify problematic genes.

This is expected to be rare for fitted genes and common for the rest of the atlas_gene. If too many fitted genes are also misfit, then one should be suspicious whether the query metacell is “truly” of the projected_type.

essential mask per gene per query metacell

Which of the atlas_gene were also listed in the essential_gene_of_<type> for the projected_type (and also the projected_secondary_type, if any) of each query metacell.

If an essential gene is also a misfit gene, then one should be very suspicious whether the query metacell is “truly” of the projected_type.

projected_correlation per query metacell

The correlation between between the corrected_fraction and the projected_fraction for only the fitted genes expression levels of each query metacell. This serves as a very rough estimator for the quality of the projection for this query metacell (e.g. can be used to compute R^2 values).

In general we expect high correlation (more than 0.9 in most metacells) since we restricted the fitted genes mask only to genes we projected well.

projected_correlation per gene

For every gene (not only fitted genes), the correlation between the corrected_fraction and projected_fraction across all the query metacells. In general we expect high correlation for fitted genes and low correlation for the rest.

projected_fold per gene per query metacell

The fold factor between the corrected_fraction and the projected_fraction (0 for non-atlas_gene). If the absolute value of this is high (3 for 8x ratio) then the gene was not projected well for this metacell. This will be 0 for non-atlas_gene, or if the total number of UMIs is too low.

It is expected this would have low values for most fitted genes and high values for the rest of the atlas_gene, but specific values will vary from one query metacell to another. This allows the analyst to make fine-grained determination about the quality of the projection, and/or identify quantitative differences between the query and the atlas (e.g., when studying perturbed systems such as knockouts or disease models).

similar mask per query metacell

A conservative determination of whether the query metacell is “similar” to its projection on the atlas. This is based on whether the number of misfit for the query metacell is low enough (by default, up to 3 genes), and also that at least 75% of the essential genes of the query metacell were not misfit genes. Note that this explicitly allows for a projected_secondary_type, that is, a metacell of doublets will be “similar” to the atlas, but a metacell of a novel state missing from the atlas will be “dissimilar”.

The final determination of whether to accept the projection is, as always, up to the analyst, based on prior biological knowledge, the context of the collection of the query (and atlas) data sets, etc. The analyst need not (indeed, should not) blindly accept the similar determination without examining the rest of the quality control data listed above.

Installation

In short: pip install metacells. Note that metacells requires many “heavy” dependencies, most notably numpy, pandas, scipy, scanpy, which pip should automatically install for you. If you are running inside a conda environment, you might prefer to use it to first install these dependencies, instead of having pip install them from PyPI.

Note that metacells only runs natively on Linux and MacOS. To run it on a Windows computer, you must activate Windows Subsystem for Linux and install metacells within it.

The metacells package contains extensions written in C++. The metacells distribution provides pre-compiled Python wheels for both Linux and MacOS, so installing it using pip should not require a C++ compilation step.

Note that for X86 CPUs, these pre-compiled wheels were built to use AVX2 (Haswell/Excavator CPUs or newer), and will not work on older CPUs which are limited to SSE. Also, these wheels will not make use of any newer instructions (such as AVX512), even if available. While these wheels may not the perfect match for the machine you are running on, they are expected to work well for most machines.

To see the native capabilities of your machine, you can grep flags /proc/cpuinfo | head -1 which will give you a long list of supported CPU features in an arbitrary order, which may include sse, avx2, avx512, etc. You can therefore simply grep avx2 /proc/cpuinfo | head -1 to test whether AVX2 is/not supported by your machine.

You can avoid installing the pre-compiled wheel by running pip install metacells --no-binary :all:. This will force pip to compile the C++ extensions locally on your machine, optimizing for its native capabilities, whatever these may be. This will take much longer but may give you somewhat faster results (note: the results will not be exactly the same as when running the precompiled wheel due to differences in floating-point rounding). Also, this requires you to have a C++ compiler which supports C++14 installed (either g++ or clang). Installing a C++ compiler depends on your specific system (using conda may make this less painful).

Vignettes

The latest vignettes can be found here.

References

Please cite the references appropriately in case they are used:

License (MIT)

Copyright © 2020-2023 Weizmann Institute of Science

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

History

0.5

  • First published version.

0.6

  • More robust graph partition.

  • Allow forcing feature genes.

  • Rename “project” to “convey” to prepare for addition of atlas projection functionality.

0.7.0

  • Switch to new project template.

  • Fix some edge cases in the pipeline.

  • Switch to using psutil for detecting system resources.

  • Fix binary wheel issues.

  • Give up on using travis-ci.

0.8.0

  • Add inner fold factor computation for metacells quality control.

  • Add deviant fold factor computation for metacells quality control.

  • Add projection of query data onto an atlas.

  • Self-adjusting pile sizes.

  • Add convenience function for computing data for MCView.

  • Better control over filtering using absolute fold factors.

  • Fix edge case in computing noisy lonely genes.

  • Additional outliers certificates.

  • Stricter deviants detection policy

0.9.0

  • Improved and published projection algorithm.

  • Restrict number of rare gene candidates.

  • Tighter control over metacells size and internal quality.

  • Improved divide-and-conquer strategy.

  • Base deviants (outliers) on gaps between cells.

  • Terminology changes (see the README for details).

  • Projection!

0.9.1

  • Fix build for python 3.11.

  • More robust gene selection, KNN graph creation, and metacells partition.

  • More thorough binary wheels.

0.9.2

  • Fix numpy compatibility issue.

  • Fix K of UMAP skeleton KNN graph (broken in 0.9.1).

0.9.3

  • Allow specifying both target UMIs and target size (in cells) for the metacells, and adaptively try to satisfy both. This should produce better-sized metacells “out of the box” compared to 0.9.[0-2].

0.9.4

  • Fix minor bug in regularization of metacell fractions.

  • Fix issue with canonical sparse matrices after downsampling (probably due to scipy.sparse updates?)

  • Fix using deprecated AnnData APIs.

0.9.5

  • Improve recovery from unstable convex solvers.

  • Fix an edge case in computing deviant cells using the gaps policy.

  • Turns out the threadpoolctl is not thread-safe (the irony!), causing deadlocks. Work around this.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metacells-0.9.5.tar.gz (400.2 kB view details)

Uploaded Source

Built Distributions

metacells-0.9.5-cp312-cp312-musllinux_1_1_x86_64.whl (82.4 MB view details)

Uploaded CPython 3.12 musllinux: musl 1.1+ x86-64

metacells-0.9.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (82.1 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

metacells-0.9.5-cp312-cp312-macosx_11_0_arm64.whl (5.8 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

metacells-0.9.5-cp312-cp312-macosx_10_9_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.12 macOS 10.9+ x86-64

metacells-0.9.5-cp311-cp311-musllinux_1_1_x86_64.whl (82.9 MB view details)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

metacells-0.9.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (82.1 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

metacells-0.9.5-cp311-cp311-macosx_11_0_arm64.whl (5.8 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

metacells-0.9.5-cp311-cp311-macosx_10_9_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.11 macOS 10.9+ x86-64

metacells-0.9.5-cp310-cp310-musllinux_1_1_x86_64.whl (82.6 MB view details)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

metacells-0.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (81.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

metacells-0.9.5-cp310-cp310-macosx_11_0_arm64.whl (5.8 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

metacells-0.9.5-cp310-cp310-macosx_10_9_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

metacells-0.9.5-cp39-cp39-musllinux_1_1_x86_64.whl (82.7 MB view details)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

metacells-0.9.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (81.7 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

metacells-0.9.5-cp39-cp39-macosx_11_0_arm64.whl (5.8 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

metacells-0.9.5-cp39-cp39-macosx_10_9_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

metacells-0.9.5-cp38-cp38-musllinux_1_1_x86_64.whl (83.3 MB view details)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

metacells-0.9.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (82.2 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

metacells-0.9.5-cp38-cp38-macosx_11_0_arm64.whl (5.8 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

metacells-0.9.5-cp38-cp38-macosx_10_9_x86_64.whl (6.0 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

metacells-0.9.5-cp37-cp37m-musllinux_1_1_x86_64.whl (82.4 MB view details)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

metacells-0.9.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (82.1 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

metacells-0.9.5-cp37-cp37m-macosx_10_9_x86_64.whl (5.8 MB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

File details

Details for the file metacells-0.9.5.tar.gz.

File metadata

  • Download URL: metacells-0.9.5.tar.gz
  • Upload date:
  • Size: 400.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.4

File hashes

Hashes for metacells-0.9.5.tar.gz
Algorithm Hash digest
SHA256 7ac12cba4fffb2972c5204e96097b7f76db99c9ea7534e47dbd55d1f28ca6165
MD5 d6c5d4d7497a89f653096253b2b51da1
BLAKE2b-256 7e5ed9c3a6d295fa54b941a9b038d18fbd027929d65707d64a9fcd227936fd34

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp312-cp312-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp312-cp312-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 eff5531c430b348c9fa1581b825716794557be3950a84fb4ea92f67e928120f9
MD5 aadbf1ad25278150241257b948d9b03f
BLAKE2b-256 41ba02e662b4ab05d9dcf527674b3bdb8080221dc4d2c7ab6b7a665fcb718ab0

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 42ad8481d28f40cbf2f2698254f67c8a1f3c62379f4d8df6225ce795cf960ada
MD5 0fa2665cb4d5d3200be63a6753ac2668
BLAKE2b-256 731918dc454da40cef19624e0b5a7bc7247e7d48dbdf118cea14ddfec48f5efc

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4ebdb6973e9f375d0bff27f0340ec2d095f3728084f8d3a82bbfc54b2f871eb1
MD5 dddf3a0d35057a0081b1ca6a59017805
BLAKE2b-256 5767693b23960bde30bf28fd5cc2a0e3881030e6cdd09750e529f09698167a9e

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp312-cp312-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp312-cp312-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 f15f5e2b2ba6f27a11e35e86c5c0fac51064887676c823b6ed48088ff8f59ab3
MD5 11311580ad887ae6f0c575320078ec8c
BLAKE2b-256 4392b3d39d7a38dfff449ae8b1f1efd1b44c87d4777f1b21ee156536f26a01e1

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp311-cp311-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 3cc49613d3d7a4568a66ae8cce0f4b644973988dd1fb5368eb5a3d341b281913
MD5 49d1f8bedee012d6fed65e3c52ded56c
BLAKE2b-256 2ac6ff558af5ed1b3e9bfa2d0719e3e9ac3ae2413bc78c53508867e9911e7dbf

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 93110c3186bdbeeb4430292665293066977a9cfeb3666998855f94e735fba6a7
MD5 58d3e1069e7e7c11b03fc591fac30793
BLAKE2b-256 def6646130a7112fae68b4db7891888a34bbeec5321b64172bbef06b3c37d828

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 60fbac4e9e670cb1c50d46cea73503087a705e652e68a3be614d41f62b74e3ad
MD5 8841160f136468f0dd078792b93544b9
BLAKE2b-256 63f5fb0bdc0d83864e3d03e24a93df013eceac1ae8fc2c81acd5e02bb3f08c39

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 bbbb4cb4337f8610f5b286c46fdbef8a8adbd511d8e5785a43ccdad76eab7445
MD5 c4d4b9a79481653ec7b3012fce9e5f6d
BLAKE2b-256 6fa668980dcbe6640e141489fb3f407e68151e69cb5658da73834a325157d34e

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 9aa505914753dd540f8926f985946d31632310350510883884038db2550936a6
MD5 c07d0a08dc3baa836a713f0ad2350c42
BLAKE2b-256 a918d6d29116f4358ed412d9ac9014931417e69486ed233fea989806ebce8524

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dec961068d7ed5d7770893f56ee214f6d8629e100f6c3007cddd9b9ea974211e
MD5 b6e3a69133932060f69f484a1e9c188e
BLAKE2b-256 30168d72e8daac1efd50d2eb116e09fecd87c9e34bf8bf69a9229f8ffc8ffe32

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 58c9ffb829a820e34caced0cf982956d538235f88efc805965c90f08ba498c39
MD5 4e4ef526acdcf4af49856f1ce55bff82
BLAKE2b-256 f4e274917fcb6d078348c6c5dfb4ee1467ef8dce457321449b645ebb177ee2ef

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 5685f5a73a368eb920ecdb9a906c37518912a9dc72b1ab32cf668b255b163b52
MD5 f509c3c46acdf0c7899970970950f302
BLAKE2b-256 902925b534df04d6e0778e1cddf62b9ca1a46ca7a3e21e91c066b26b7e9bb68d

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 0f6d6167f7b1158f5a9903fb90b868ce797f9e6db4d9916b0b24be7f00e1b946
MD5 28f041339b69e35995e80ffa29b95ef3
BLAKE2b-256 dbcd056833aa8e5ed013583348c0c068ad0d1b17480d5f74f717c3d6f7a07a5e

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dfa0d0a3c47ab5f37c1f7f170aa44aae25f0736ccb6af71ca85bea2baa840500
MD5 8a8ce63a8a019b74125deb665df96c15
BLAKE2b-256 b3e7740f2a2e078bad48f6f97770829b0001555fa21966783c7b1489105866ed

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 402aeed6f367975b050599704cf8dba838bbbdd7b405ea5a40e209714557e7a5
MD5 08bcd48e0f89282a26ecc0bed13dc93d
BLAKE2b-256 afd40afa901e2958baafac09de63570fb46bb3984efc04d4eb0e9304370feb57

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 9ec1319abcb18834b227de3932281c8c307186c91c7d157f94d51c65da991bd5
MD5 17637c688e4323392c4c4d61e49ec840
BLAKE2b-256 189a3b7c61b1d8282b163a89ddff4f1fc2c56ae149402ba3b10109b74f41157e

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 b758179f3a7653841c497582c78ecd8e7cb8c3e9c190eb50b4f5771fdcf3cba5
MD5 512aa3cd40d64517bf4339039a375f9f
BLAKE2b-256 40e3246b000771b3094bdf1236606dd1e4c5f1dc452996bb6c1c6da9689c1452

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3a3c30866b3a45ed71840a37b86aad41a0bed5c890d705b8b888512da6c72f58
MD5 7e93bc8314eb5a7f4d211666a035966d
BLAKE2b-256 4db19a361f2a400d8c94529bbef43b7e7ad6b1ae922bd23c282a0f5689476f08

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f29961529e60bccb4d44da12d1c1447609457f893410342a6e0bbcd8ad2760e8
MD5 f410fc50fe866beb071a36be2309661e
BLAKE2b-256 d706e3c85bd876a3861bb546bc2312a14e2034da332ecededcd4ca95260ed411

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 d5689d11d5c157d22a09228e9a9cddf7c19aeb6ba915d7dbf5471f2ffcdc1707
MD5 6c9e1bc7280ea5f39797ce2afeb1a8b9
BLAKE2b-256 eccf7056b60ae27ff9950fb87cf19112744b906b98ecceb30a80aeecc15a0040

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp37-cp37m-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp37-cp37m-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 6d7d1fac55d850172e0e5e61f192e1a8ec92fbf46c5b9270ca4968468161721c
MD5 eedfb2037dd95e90142ab4d14701d030
BLAKE2b-256 1007ac4f5d11960de8deb36989c3b745428862a4930a2fdada4a7178f6b5008d

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d0d10b107705ae7fbea6fb8bc3eb19aa29d6591d26b5bc9d69edbe0c7b2a8ccd
MD5 de9f07f9602db715e93f42b7f9eca147
BLAKE2b-256 e8e1080d65efaae8211878b4228eb51b7123ad0c986e35203dc868c9a9cb9fe2

See more details on using hashes here.

File details

Details for the file metacells-0.9.5-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for metacells-0.9.5-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c6d62081d641f303cff8f069bb16a6e7d6e8422ae392dd081bc74107eb991837
MD5 aff085f1e5febf9926bc8896004b5568
BLAKE2b-256 60f074a1a73ad8c3ecf920b3bb375059ccca3395eb87483b0c9682a3f0556ca1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page