Missing-data matrix for RNA-Seq and proteomics QC
Project description
mismap-qc
A prettier missing-data matrix for RNA-Seq QC, inspired by missingno. Shows which genes are detected vs missing across samples, with multi-level colour annotations and hierarchical clustering.
Quick start
No virtual environment needed -- uses PEP 723 inline script dependencies with uv.
uv run demo.py
Or import directly:
import pandas as pd
from mismap_qc import missing_matrix
df = pd.read_csv("data/toy_rnaseq.csv", index_col=0, header=[0, 1, 2])
fig = missing_matrix(df, title="Gene Detection Matrix")
Input format
A pandas DataFrame with:
- Rows = genes (or any features)
- Columns = samples, optionally as a
MultiIndexfor annotation strips - NaN = missing / not detected
When columns are a MultiIndex, level names automatically become annotation strip labels.
missing_matrix() -- static plot
fig = missing_matrix(
df,
title="Gene Detection Matrix",
subtitle="80 genes x 30 samples | 23% missing",
save="output.png",
)
Layout (top to bottom)
| Component | Description |
|---|---|
| Title + subtitle | Bold title, italic subtitle for metadata |
| Dendrogram | Hierarchical clustering of samples by nullity pattern |
| Annotation strips | One colour bar per MultiIndex column level |
| Nullity matrix | Dark = detected, light = missing |
| Completeness sparkline | Per-sample or per-gene detection rate |
Parameters
Data & labels
| Parameter | Type | Default | Description |
|---|---|---|---|
df |
DataFrame |
required | Genes (rows) x samples (columns). NaN = missing. |
title |
str |
"" |
Bold figure title |
subtitle |
str |
"" |
Italic line below title (e.g. dataset metadata) |
label_level |
int |
-1 |
Which column level to use for x-axis tick labels |
Clustering & sorting
| Parameter | Type | Default | Description |
|---|---|---|---|
cluster_samples |
bool |
True |
Cluster samples by binary nullity pattern |
cluster_method |
str |
"average" |
scipy linkage method |
show_dendrogram |
bool |
True |
Show dendrogram above the matrix |
sort_genes |
str | None |
"descending" |
Sort genes by completeness ("ascending", "descending", or None) |
Annotations
| Parameter | Type | Default | Description |
|---|---|---|---|
annotation_levels |
list[int] | None |
None |
Column levels to show as colour bars (default: all except innermost) |
annotation_colors |
dict | None |
None |
Custom colours per level (see below) |
Custom annotation colours accept level indices or names as keys:
missing_matrix(
df,
annotation_colors={
"Medium_Type": {"Fresh": "#88CCEE", "Conditioned": "#CC6677"},
"Medium_Condition": {"SF": "#44AA99", "FBS": "#DDCC77", "AS": "#AA4499"},
},
)
Unspecified factor levels fall back to built-in palettes.
Completeness sparkline
| Parameter | Type | Default | Description |
|---|---|---|---|
completeness |
str |
"below" |
"below" = per-sample (horizontal), "side" = per-gene (vertical) |
completeness_threshold |
float | None |
None |
Draws a dashed red line at this value (0--1) |
Legends & layout
| Parameter | Type | Default | Description |
|---|---|---|---|
legend_loc |
str |
"upper right" |
Corner for legends: "upper right", "upper left", "lower right", "lower left" |
figsize |
tuple | None |
None |
Figure size (auto-calculated if None) |
color_present |
str |
"#2d2d2d" |
Colour for detected cells |
color_missing |
str |
"#f0f0f0" |
Colour for missing cells |
Font sizes
| Parameter | Type | Default | Description |
|---|---|---|---|
fontsize |
int |
10 |
Base font size (fallback) |
fontsize_legend |
int | None |
None |
Legend entries |
fontsize_rows |
int | None |
None |
Gene/row labels |
fontsize_cols |
int | None |
None |
Sample/column labels |
fontsize_annotations |
int | None |
None |
Annotation strip labels |
Group summary
| Parameter | Type | Default | Description |
|---|---|---|---|
group_summary |
int | str | None |
None |
Column level to group by; prints per-group completeness to console |
fig = missing_matrix(df, group_summary="Medium_Condition")
Output:
Group Completeness (Medium_Condition)
--------------------------------
SF 63% (n=10)
AS 80% (n=10)
FBS 88% (n=10)
Only prints when the level has more than one group.
Split by factor
| Parameter | Type | Default | Description |
|---|---|---|---|
split_by |
int | str | None |
None |
Split into side-by-side panels by this column level |
fig = missing_matrix(df, split_by="Medium_Condition", annotation_levels=[0])
Each panel is independently clustered. The split level is automatically removed from annotation strips.
Output
| Parameter | Type | Default | Description |
|---|---|---|---|
save |
str | None |
None |
Save figure to this path |
dpi |
int |
150 |
Save resolution |
missing_matrix_html() -- interactive HTML
Plotly-based interactive version with hover tooltips showing gene name, sample ID, all annotation levels, and detection status.
from mismap_qc import missing_matrix_html
missing_matrix_html(
df,
title="Gene Detection Matrix (Interactive)",
subtitle="80 genes x 30 samples",
completeness_threshold=0.5,
save="output/interactive.html",
)
Supports the same clustering, sorting, annotation, and completeness options as the static version. Additional parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
width |
int | None |
None |
Plot width in pixels (auto-calculated if None) |
height |
int | None |
None |
Plot height in pixels (auto-calculated if None) |
Requires plotly (pip install plotly or included via PEP 723 in demo.py).
Generating toy data
uv run make_toy_data.py
Creates data/toy_rnaseq.csv: 80 genes x 30 samples with structured missingness patterns across 6 groups (Fresh/Conditioned x SF/FBS/AS).
Dependencies
- numpy
- matplotlib
- scipy
- pandas
- plotly (optional, for HTML export only)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mismap_qc-0.1.0.tar.gz.
File metadata
- Download URL: mismap_qc-0.1.0.tar.gz
- Upload date:
- Size: 484.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
738c9ccf5e46aec40447621b30be85b347c88f65ebcacc9f3e6ecc153a77645a
|
|
| MD5 |
e4bd4a0ff14fd5919710a15aede09ae7
|
|
| BLAKE2b-256 |
2267e7f8aaf1a3e4a1845790bb15f76aa5a22829b7f6e1eb8aca126a54ab7594
|
File details
Details for the file mismap_qc-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mismap_qc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc76e79094f85dcdf4b06a0efc1e374b353e9407e602f126c35717b5f6dfacc8
|
|
| MD5 |
a713b6cc959662bcd397105083c8e8b8
|
|
| BLAKE2b-256 |
5c1bad3af6632dd3c67cf72f38e4700035c1b9095da327fed924e5205e8ba177
|