Analysis of barcoded, pooled functional genomics assays.
Project description
🍹 bartab
bartab estimates relative fitness from sequencing-based pooled competition experiments.
It is designed for experiments where barcoded strains, guides, mutants, or constructs are grown together, sampled over time, and quantified by NGS read counts or UMI counts. bartab estimates the fitness of each barcode relative to a reference barcode, usually WT, using either spike-in normalisation or measured culture growth.
bartab provides:
- a command-line interface for fitting, plotting, and simulating pooled fitness experiments
- a Python API built around
AnnData - weighted least-squares fitness estimation
- dose-response fitting with a Hill/log-logistic model
- diagnostic plots
- synthetic data simulation
Contents
- Installation
- Conceptual model
- Input tables
- Command-line interface
- Python API
- Output structure
- Interpretation
- Limitations
- Issues, problems, suggestions
- Documentation
Installation
From PyPI
pip install bartab
From source
git clone https://github.com/scbirlab/bartab.git
cd bartab
pip install -e .
For development, install test and documentation dependencies as appropriate for your local setup.
Conceptual model
Interactive tutorial on analysis principles.
Input tables
bartab expects three tables:
- count table
- sample sheet
- barcode sheet
Files may be CSV, TSV, TXT, or other formats supported by carabiner.pd.read_table.
Count table
One row per barcode per sample.
Required columns:
| Meaning | Default CLI option | Typical name |
|---|---|---|
| barcode / strain identifier | --barcode-column |
strain_id |
| sample identifier | --sample-column |
sample_id |
| count value | --count-column |
count |
Example:
| strain_id | sample_id | count |
|---|---|---|
| wt | sample_0 | 12034 |
| mutant_A | sample_0 | 8312 |
| spike | sample_0 | 5021 |
| wt | sample_1 | 18420 |
| mutant_A | sample_1 | 6420 |
| spike | sample_1 | 2100 |
Counts must be non-negative.
Sample sheet
One row per sequencing sample.
Required columns:
| Meaning | Default CLI option | Typical name |
|---|---|---|
| sample identifier | --sample-column |
sample_id |
| culture / biological replicate | --culture-column |
culture_id |
| timepoint | --timepoint-column |
timepoint |
Optional columns:
| Meaning | CLI option | Used for |
|---|---|---|
| concentration | --concentration-column |
dose-response analysis |
| growth measurement | --growth-column |
growth-based normalisation |
| sampled volume | --volume-column |
adaptive-volume correction with spike-in |
Example:
| sample_id | culture_id | timepoint | dose | growth | volume |
|---|---|---|---|---|---|
| sample_0 | rep1 | 0 | 0 | 0.05 | 1.0 |
| sample_1 | rep1 | 1 | 0 | 0.20 | 1.0 |
| sample_2 | rep1 | 2 | 0 | 0.80 | 1.0 |
If --t0 is not supplied, the minimum value in the timepoint column is used as the initial timepoint.
Barcode sheet
One row per barcode.
Required columns:
| Meaning | Default CLI option | Typical name |
|---|---|---|
| barcode / strain identifier | --barcode-column |
strain_id |
Optional metadata columns are preserved in the output AnnData.obs.
Example:
| strain_id | gene | annotation |
|---|---|---|
| wt | WT | reference |
| mutant_A | geneA | deletion mutant |
| spike | spike | non-growing spike-in |
Command-line interface
Run:
bartab --help
bartab has three main commands:
bartab fit # estimate fitness
bartab plot # plot fitted results
bartab sim # simulate pooled competition data
bartab fit
Estimate relative fitness per strain or barcode.
Minimal spike-in example
bartab fit counts.csv \
--sample-sheet sample_meta.csv \
--barcode-sheet strain_meta.csv \
--output results.h5ad \
--reference wt \
--spike-name spike \
--use-spike \
--barcode-column strain_id \
--sample-column sample_id \
--culture-column culture_id \
--count-column count \
--timepoint-column timepoint
This writes:
results.h5ad
results.csv
when the output path ends in .h5ad.
Minimal growth-based example
bartab fit counts.csv \
--sample-sheet sample_meta.csv \
--barcode-sheet strain_meta.csv \
--output results.h5ad \
--reference wt \
--barcode-column strain_id \
--sample-column sample_id \
--culture-column culture_id \
--count-column count \
--timepoint-column timepoint \
--growth-column growth \
--growth-type density
Dose-response example
bartab fit counts.csv \
--sample-sheet sample_meta.csv \
--barcode-sheet strain_meta.csv \
--output dose_response.h5ad \
--reference wt \
--spike-name spike \
--use-spike \
--barcode-column strain_id \
--sample-column sample_id \
--culture-column culture_id \
--count-column count \
--timepoint-column timepoint \
--concentration-column dose
If --concentration-column is supplied, bartab first fits per-concentration WLS fitness estimates and then fits a Hill/log-logistic dose-response model.
Output format
If --output ends in:
| Output suffix | Behaviour |
|---|---|
.h5ad |
writes full annotated AnnData object and a companion .csv table |
.h5ad.gz |
writes compressed AnnData and companion .csv table |
.csv |
writes fitted parameter table only |
.tsv / .txt |
writes fitted parameter table only, tab-separated |
Main fit options
| Option | Default | Description |
|---|---|---|
input |
required | count table |
--output, -o |
required | output file |
--sample-sheet, -c |
required | sample metadata table |
--barcode-sheet, -b |
required | barcode metadata table |
--reference, -r |
required | reference barcode/strain |
--barcode-column, -u |
strain_name |
barcode column in count and barcode tables |
--sample-column, -s |
sample_id |
sample ID column |
--culture-column |
culture_id |
biological replicate/culture column |
--count-column, -a |
count |
count column |
--timepoint-column, -t |
timepoint |
timepoint column |
--t0, -0 |
minimum timepoint | initial timepoint value |
--spike-name, -x |
None |
spike-in barcode name |
--use-spike |
False |
use spike-in to estimate culture expansion |
--growth-column, -d |
OD600 |
growth measurement column |
--growth-type |
density |
either density or generations |
--volume-column |
None |
sampled volume column |
--concentration-column, -k |
None |
concentration column for dose-response |
--pseudocount |
1.0 |
value added to counts before log transform |
--model-type |
WLS |
currently accepted: WLS, OLS, HillFitnessModel |
bartab plot
Generate diagnostic plots from a fitted .h5ad file.
bartab plot results.h5ad \
--output results_plots \
--model-type WLS \
--plot-format png
This writes plots using the supplied prefix:
results_plots_time-count.png
results_plots_time-ratio.png
results_plots_expansion-ratio.png
results_plots_pred-obs.png
results_plots_volcano.png
For dose-response/Hill results:
bartab plot dose_response.h5ad \
--output dose_response_plots \
--model-type HillFitnessModel \
--plot-format png
This additionally writes:
dose_response_plots_dr.png
Plot options
| Option | Default | Description |
|---|---|---|
input |
required | fitted .h5ad file |
--output, -o |
required | filename prefix |
--highlight, -X |
None |
barcode names to highlight |
--model-type |
WLS |
model to plot; WLS, OLS, or HillFitnessModel |
--plot-format |
png |
png, PNG, pdf, or PDF |
bartab sim
Simulate count data from a pooled competition experiment.
The simulator takes a JSON file describing true strain fitness values and writes three CSV files:
<output>_count.csv
<output>_sample_meta.csv
<output>_strain_meta.csv
Single-concentration simulation
Input JSON:
{
"wt": 1.0,
"spike": 0.0,
"mutant_fast": 1.25,
"mutant_slow": 0.5
}
Run:
bartab sim fitness.json \
--output synthetic/single \
--timepoints 8 \
--n-cultures 3 \
--reads-per-barcode 1000 \
--seed 42
Then fit:
bartab fit synthetic/single_count.csv \
--sample-sheet synthetic/single_sample_meta.csv \
--barcode-sheet synthetic/single_strain_meta.csv \
--output synthetic/single_results.h5ad \
--reference wt \
--spike-name spike \
--use-spike \
--barcode-column strain_id \
--sample-column sample_id \
--culture-column replicate \
--count-column count \
--timepoint-column timepoint
Dose-response simulation
For dose-response simulation, the JSON values are interpreted as [IC50, bottom].
Input JSON:
{
"wt": [10000.0, 1.0],
"spike": [10000.0, 0.0],
"sensitive": [10.0, 0.0],
"resistant": [1000.0, 0.0]
}
Run:
bartab sim dose_fitness.json \
--output synthetic/dose \
--n-dose 6 \
--dose-max 1000 \
--dose-fold 2 \
--timepoints 8 \
--n-cultures 3 \
--reads-per-barcode 1000 \
--seed 42
Then fit:
bartab fit synthetic/dose_count.csv \
--sample-sheet synthetic/dose_sample_meta.csv \
--barcode-sheet synthetic/dose_strain_meta.csv \
--output synthetic/dose_results.h5ad \
--reference wt \
--spike-name spike \
--use-spike \
--barcode-column strain_id \
--sample-column sample_id \
--culture-column replicate \
--count-column count \
--timepoint-column timepoint \
--concentration-column dose
Simulation options
| Option | Default | Description |
|---|---|---|
input |
required | JSON file of strain fitness values |
--output, -o |
required | output prefix |
--generate-controls, -z |
0 |
number of neutral controls to generate |
--generate-more, -m |
0 |
number of additional random-fitness strains |
--inoculum, -n |
1000 |
cells per strain in inoculum |
--carrying-capacity, -K |
10.0 |
maximum fold-change supported |
--timepoints, -s |
10 |
number of sampled timepoints |
--max-time, -t |
10 |
final simulated timepoint |
--n-cultures, -r |
3 |
biological replicate cultures |
--reads-per-barcode, -b |
1000 |
mean sequencing depth per barcode per sample |
--seed, -e |
42 |
random seed |
--n-dose, -d |
None |
number of dose-response concentrations |
--dose-max |
1000 |
maximum simulated dose |
--dose-fold |
2 |
dilution spacing parameter |
Python API
The Python API is built around AnnData.
A typical workflow is:
- load count, sample, and barcode tables into
AnnData - compute barcode/reference log-ratios and expansion axis
- fit a model
- inspect
adata.obs - optionally plot or write output
Load data
from bartab.io import load_anndata
adata = load_anndata(
counts="counts.csv",
sample_meta="sample_meta.csv",
strain_meta="strain_meta.csv",
reference="wt",
spike="spike",
count_column="count",
strain_id="strain_id",
sample_id="sample_id",
culture_id="culture_id",
timepoint_column="timepoint",
)
The same function also accepts pandas DataFrame objects:
adata = load_anndata(
counts=counts_df,
sample_meta=sample_df,
strain_meta=strain_df,
reference="wt",
spike="spike",
count_column="count",
strain_id="strain_id",
sample_id="sample_id",
culture_id="culture_id",
timepoint_column="timepoint",
)
For dose-response data:
adata = load_anndata(
counts=counts_df,
sample_meta=sample_df,
strain_meta=strain_df,
reference="wt",
spike="spike",
count_column="count",
strain_id="strain_id",
sample_id="sample_id",
culture_id="culture_id",
timepoint_column="timepoint",
concentration_column="dose",
)
Compute log-ratios
Spike-in normalisation
from bartab.transforms import compute_log_ratios
adata = compute_log_ratios(
adata,
pseudocount=1.0,
use_spike=True,
)
With adaptive-volume correction:
adata = compute_log_ratios(
adata,
pseudocount=1.0,
use_spike=True,
volume_column="volume",
)
Growth-based normalisation
adata = compute_log_ratios(
adata,
pseudocount=1.0,
growth_column="growth",
growth_type="density",
use_spike=False,
)
If the growth column already contains generations:
adata = compute_log_ratios(
adata,
pseudocount=1.0,
growth_column="generations",
growth_type="generations",
use_spike=False,
)
This adds:
adata.layers["__log_ratio__"]
adata.var["__log_expansion__"]
Fit fitness models
Weighted least squares
from bartab.models.anndata import AnnDataWLSModel
model = AnnDataWLSModel()
adata = model.fit(adata)
Results are written into adata.obs with names prefixed by the model name.
For WLS, common output columns include:
WLS:slope
WLS:slope_p
WLS:slope_se
WLS:slope_ci_low
WLS:slope_ci_high
WLS:fitness
WLS:fitness_low
WLS:fitness_high
WLS:nobs
WLS:rsq
WLS:fit_status
Predictions are written to:
adata.layers["WLS:predicted"]
The model name is also appended to:
adata.uns["models_fitted"]
Ordinary least squares
from bartab.models.anndata import AnnDataOLSModel
model = AnnDataOLSModel()
adata = model.fit(adata)
OLS is mainly useful as an unweighted baseline or diagnostic comparison.
Fit dose-response models
For dose-response analysis, first load data with concentration_column set, then compute log-ratios, fit WLS, and fit the Hill model.
from bartab.io import load_anndata
from bartab.transforms import compute_log_ratios
from bartab.models.anndata import AnnDataWLSModel, AnnDataHillModel
adata = load_anndata(
counts="dose_count.csv",
sample_meta="dose_sample_meta.csv",
strain_meta="dose_strain_meta.csv",
reference="wt",
spike="spike",
count_column="count",
strain_id="strain_id",
sample_id="sample_id",
culture_id="replicate",
timepoint_column="timepoint",
concentration_column="dose",
)
adata = compute_log_ratios(
adata,
pseudocount=1.0,
use_spike=True,
)
adata = AnnDataWLSModel().fit(adata)
adata = AnnDataHillModel().fit(
adata,
concentration="dose",
)
Hill model outputs include:
HillFitnessModel:log_ic50
HillFitnessModel:log_ic50_p
HillFitnessModel:log_ic50_se
HillFitnessModel:log_ic50_ci_low
HillFitnessModel:log_ic50_ci_high
HillFitnessModel:ic50
HillFitnessModel:ic50_low
HillFitnessModel:ic50_high
HillFitnessModel:h
HillFitnessModel:h_p
HillFitnessModel:nobs
HillFitnessModel:dof
HillFitnessModel:fit_status
The Hill model uses a log-logistic form fit on log concentration. Concentration-zero samples are omitted from the non-linear fit.
Plot results
Plotting functions live in bartab.plotting.
from bartab.plotting import (
time_vs_count,
time_vs_ratio,
expansion_vs_count,
expansion_vs_ratio,
pred_vs_true,
volcano,
dose_response,
)
Examples:
fig, ax = time_vs_count(
adata,
highlight_barcodes=["mutant_A"],
filename="plots/time_count.png",
)
fig, ax = expansion_vs_ratio(
adata,
highlight_barcodes=["mutant_A"],
filename="plots/expansion_ratio.png",
)
fig, ax = pred_vs_true(
adata,
model_name="WLS",
highlight_barcodes=["mutant_A"],
filename="plots/pred_obs.png",
)
fig, ax = volcano(
adata,
model_name="WLS",
param="fitness",
p="slope_p",
vline=1.0,
highlight_barcodes=["mutant_A"],
filename="plots/volcano.png",
)
For dose-response:
fig, ax = dose_response(
adata,
model_name="WLS",
highlight_barcodes=["sensitive", "resistant"],
filename="plots/dose_response.png",
)
Plot files are saved at high resolution. When a filename is provided, bartab also writes the plotted data alongside the figure through the internal figsaver utility.
Simulate data
The simulator can be used from Python.
from bartab.simulation import calculate_growth_curves, reads_sampler
fitness = {
"wt": 1.0,
"mutant_slow": 0.5,
"mutant_fast": 1.25,
}
t, ref_expansion, growths = calculate_growth_curves(
inoculum=1_000,
fitness=fitness,
inoculum_var=0.1,
carrying_capacity=10,
n_timepoints=8,
max_time=10.0,
ref_key="wt",
seed=42,
)
counts = reads_sampler(
growths,
seq_depth=1_000,
sample_frac=0.1,
reps=3,
variance=0.001,
seed=42,
)
counts has shape:
n_strains × n_replicates × n_timepoints
Low-level model API
The lower-level model classes operate directly on arrays.
import numpy as np
from bartab.models.linear import OLSModel, WLSModel
Y = np.array([
[0.0, -0.5, -1.0],
[0.0, 0.0, 0.0],
])
x = np.array([0.0, 1.0, 2.0])
model = OLSModel()
results, preds = model.fit(Y, x)
results is a list of dictionaries, one per row of Y.
For most users, the AnnData model API is preferable.
Output structure
bartab stores results in an AnnData object.
Main matrix
adata.X
Raw count matrix, with:
rows = barcodes / strains
columns = samples
Barcode metadata
adata.obs
Contains barcode metadata and model outputs.
Important internal columns include:
| Column | Meaning |
|---|---|
__is_reference__ |
whether the barcode is the reference |
__is_spike__ |
whether the barcode is the spike-in |
| model-prefixed columns | fitted parameters and statistics |
Example model columns:
WLS:fitness
WLS:fitness_low
WLS:fitness_high
WLS:slope_p
HillFitnessModel:ic50
HillFitnessModel:log_ic50_p
Sample metadata
adata.var
Contains sample metadata.
Important internal columns include:
| Column | Meaning |
|---|---|
__is_t0__ |
whether the sample is an initial timepoint |
__culture_index__ |
culture/replicate identifier |
__inducer__ |
concentration or "single dose" |
__log_expansion__ |
fitted x-axis: culture expansion |
Layers
adata.layers
Important layers include:
| Layer | Meaning |
|---|---|
__log_ratio__ |
observed barcode/reference log-ratio change |
WLS:predicted |
WLS fitted values |
OLS:predicted |
OLS fitted values |
HillFitnessModel:predicted |
Hill model fitted values |
Unstructured metadata
adata.uns
Includes:
| Key | Meaning |
|---|---|
reference |
reference barcode name |
spike |
spike-in barcode name |
timepoint_column |
timepoint column used |
count_column |
count column used |
concentration_column |
concentration column, if any |
strain_id |
barcode ID column(s) |
sample_id |
sample ID column(s) |
culture_id |
culture/replicate column(s) |
models_fitted |
list of fitted models |
Interpretation
For single-concentration WLS output:
| Output | Interpretation |
|---|---|
WLS:fitness = 1 |
barcode grows like the reference |
WLS:fitness < 1 |
growth disadvantage |
WLS:fitness > 1 |
growth advantage |
WLS:slope_p |
evidence that the barcode deviates from neutral fitness |
WLS:fitness_low, WLS:fitness_high |
confidence interval for relative fitness |
For dose-response output:
| Output | Interpretation |
|---|---|
HillFitnessModel:ic50 |
concentration giving 50% inhibition |
| lower IC50 | greater sensitivity |
| higher IC50 | greater resistance |
HillFitnessModel:h |
fitted Hill/logistic slope |
HillFitnessModel:log_ic50_p |
evidence for non-zero IC50 parameter estimate |
Dose-response estimates are most reliable when the tested concentrations bracket the fitness transition.
Practical guidance
- Use the same barcode identifiers across the count table and barcode sheet.
- Use the same sample identifiers across the count table and sample sheet.
- If using spike-in normalisation, include exactly one spike-in barcode.
- If using growth-based normalisation, growth values must be positive.
- For dose-response analysis, concentrations must be numeric.
- Low-count barcodes can produce unstable estimates.
- Inspect diagnostic plots before interpreting individual hits.
- Check reference and spike-in behaviour before trusting global results.
Limitations
Results may be unreliable when:
- the reference barcode is depleted or poorly counted
- the spike-in grows, dies, degrades, or is sampled inconsistently
- bottlenecks dominate the experiment
- counts are extremely low
- barcode identities are mismatched between tables
- dose-response concentrations do not span the active range
The WLS weights are estimated using an approximate delta-method variance on log count ratios with a method-of-moments dispersion estimate.
Issues, problems, suggestions
Please add bug reports, feature requests, or suggestions to the issue tracker:
https://www.github.com/scbirlab/bartab/issues
Related resources
- Interactive Hugging Face Space: https://huggingface.co/spaces/scbirlab/bartab
- Tutorial Space: https://huggingface.co/spaces/scbirlab/tutorial-seq-fitness
- Source code: https://github.com/scbirlab/bartab
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bartab-0.0.2.tar.gz.
File metadata
- Download URL: bartab-0.0.2.tar.gz
- Upload date:
- Size: 39.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c72783355086351979396e064f970a000c4583071dbb40a5f234c14eb0bd94ba
|
|
| MD5 |
5836411ed878d1b04db06fa9115b2ed5
|
|
| BLAKE2b-256 |
287dc7c16c20f96fd3c4d6b1221fc9d1fac8e0d9326f09f79181a3f6c6c8e28b
|
File details
Details for the file bartab-0.0.2-py3-none-any.whl.
File metadata
- Download URL: bartab-0.0.2-py3-none-any.whl
- Upload date:
- Size: 36.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4779863c7527367dd033ab74e0e3f51d838c4a6affc826a973a4b918b5bf36c0
|
|
| MD5 |
2b87fe2ab95915d311faf5ada3edfd2a
|
|
| BLAKE2b-256 |
0c3436f8709121a7b82c5aff1e4c29148c593e530d3a31afd4195eb837d5164a
|