Python helpers for loading and interacting with cfDNAlab output files
Project description
cfDNAlab | Python Loaders 
Python helpers for loading cfDNAlab output files.
This package does not install or run the cfDNAlab command-line tool. The CLI is distributed separately as the Rust cfdna binary. Use this Python package after running cfDNAlab to load and analyze output files.
Supported output types are midpoint and end-motif Zarr outputs plus length-count TSV outputs: <prefix>.midpoint_profiles.zarr, <prefix>.end_motifs.zarr, and <prefix>.length_counts.tsv.zst.
NOTE: While the main CLI tool is highly tested and validated, this Python package is currently being built and may have bugs or use too AI'ish language in the documentation. The core functions should work and we are actively improving it over the coming weeks. We decided to share it early to help you use the outputs of the main tool.
Install
These instructions install only the Python loader package. To install the cfdna command-line tool, see the main repository.
Install with pip:
pip install cfdnalab
Install the current development version from GitHub:
pip install "cfdnalab @ git+https://github.com/BesenbacherLab/cfDNAlab.git#subdirectory=py-cfdnalab"
Load Midpoint Profiles
import cfdnalab as cfl
midpoints = cfl.read_midpoints("sample.midpoint_profiles.zarr")
Inspect Metadata
groups = midpoints.group_metadata()
length_bins = midpoints.length_bins()
positions = midpoints.positions()
group_metadata() returns group_idx, group_name, and eligible_intervals. length_bins() and positions() return the corresponding bin indices and half-open bp coordinates.
Extract One Profile
Use groups to select by group name and with_lengths to select the length
bin containing a fragment length in bp:
profile = midpoints.data_frame(groups="LYL1", with_lengths=167)
The returned data frame has one row per midpoint position bin.
Extract A Group Or Length Bin
Use data_frame(groups=...) for all length and position bins in one group.
Use data_frame(with_lengths=...) when you have a fragment length in bp and want the length bin that contains it.
Use with_length_range=(start, end) for all whole length bins overlapping a half-open bp range. Range selection does not split edge bins.
group_data = midpoints.data_frame(groups="LYL1")
length_bin_data = midpoints.data_frame(with_lengths=167)
length_range_data = midpoints.data_frame(with_length_range=(100, 220))
When selecting multiple lengths, each value must fall in a different length
bin. If two lengths fall in the same bin, pass one representative length or use
length_bin_idxs.
Filter By Eligible Intervals
min_intervals = 100
for _, group in midpoints.group_metadata().iterrows():
if group["eligible_intervals"] < min_intervals:
continue
profile = midpoints.data_frame(group_idxs=group["group_idx"], length_bin_idxs=0)
Extract NumPy Arrays
profile = midpoints.counts_array(group_idxs=0, length_bin_idxs=0)
group_counts = midpoints.counts_array(groups="LYL1")
length_bin_counts = midpoints.counts_array(with_lengths=167)
length_range_counts = midpoints.counts_array(with_length_range=(100, 220))
counts_array() always returns dimensions in the same order: group, length bin, and midpoint position. Scalar selectors keep their dimension as length one.
Load End-Motif Counts
import cfdnalab as cfl
ends = cfl.read_end_motifs("sample.end_motifs.zarr")
Storage Mode - Sparse or Dense
Start by checking whether the counts were stored as a dense matrix or sparse COO arrays.
ends.storage_mode()
For sparse output, data_frame() returns stored non-zero motif counts by default. Use sparse_counts_matrix() when you want a SciPy sparse matrix. Pass densify=True only when the zero-filled result is small enough to fit in memory. Densifying only includes observed motifs.
For dense output, data_frame() returns all selected rows and motifs. Use dense_counts_zarr_array() when you want the on-disk Zarr array and dense_counts_array() when you want NumPy counts in memory.
Inspect End-Motif Metadata
motifs = ends.motifs_metadata()
motif_idx = ends.motif_idx("_AA")
ends.has_motif("_AA")
read_end_motifs() returns a mode-specific object.
- Windowed output has
window_metadata(), which returnswindow_idx,chrom,start,end, andblacklisted_fraction. - Grouped output has
group_metadata()andgroup_idx(). - Every mode has
data_frame(),dense_counts_array(), andsparse_counts_matrix().
Extract End-Motif Counts
motif_idx = ends.motif_idx("_AA")
motif_counts = ends.data_frame(motifs="_AA")
Sparse output stays sparse unless you ask for dense arrays:
nonzero_counts = ends.data_frame()
motif_count_matrix = ends.sparse_counts_matrix(motifs="_AA")
motif_count_array = ends.dense_counts_array(motifs="_AA", allow_densify=True)
For dense windowed output:
windows = ends.window_metadata()
window_counts = ends.data_frame(window_idxs=0)
window_count_array = ends.dense_counts_array(window_idxs=0)
For dense grouped output:
groups = ends.group_metadata()
group_idx = ends.group_idx("t-cells")
group_counts = ends.data_frame(groups="t-cells")
group_count_matrix = ends.sparse_counts_matrix(groups="t-cells")
For global output:
global_counts = ends.dense_counts_array(allow_densify=True)
global_data = ends.data_frame(densify=True)
For windowed or grouped outputs, max_blacklisted_fraction keeps rows with blacklisted_fraction at or below the cutoff:
filtered_motif_counts = ends.data_frame(
motifs="_AA",
max_blacklisted_fraction=0.1,
)
For sparse stores, prefer data_frame(densify=False) and sparse_counts_matrix() when working with large end-motif outputs. Use densify=True only when the dense result is small enough to fit comfortably in memory.
Load Length Counts
import cfdnalab as cfl
lengths = cfl.read_lengths("sample.length_counts.tsv.zst")
read_lengths() returns a mode-specific object. Windowed output has window_metadata(), grouped output has group_metadata() and group_idx(), and every mode has counts_array() and data_frame().
bins = lengths.length_bins()
lengths.length_bin_idx(167)
counts = lengths.counts_array()
selected_counts = lengths.counts_array(with_length_range=(100, 220))
count_data = lengths.data_frame(value="count")
fraction_data = lengths.data_frame(value="fraction")
density_data = lengths.data_frame(value="density")
wide_density_data = lengths.data_frame(value="density", keep_wide=True)
range_fraction_data = lengths.data_frame(
with_length_range=(100, 220),
value="fraction",
denominator="selected_bins",
)
Use with_lengths for exact fragment lengths, with_length_range=(start, end) for whole bins overlapping a half-open bp range, or length_bin_idxs for direct bin selection. For fraction and density, denominator="all_bins" uses each row's total across all length bins, while denominator="selected_bins" uses only the returned length bins.
For global output:
global_counts = lengths.counts_array()
global_data = lengths.data_frame(value="fraction")
For windowed output:
windows = lengths.window_metadata()
window_counts = lengths.counts_array(window_idxs=0)
window_data = lengths.data_frame(window_idxs=0, value="fraction")
selected_windows = lengths.data_frame(
window_idxs=[0, 2, 3],
with_length_range=(100, 220),
value="density",
keep_wide=True,
)
For windowed or grouped outputs, max_blacklisted_fraction filters selected output rows before counts are returned:
filtered = lengths.data_frame(max_blacklisted_fraction=0.1)
Outputs without a blacklisted_fraction column keep all rows at the default max_blacklisted_fraction=1.0. Stricter cutoffs raise an error as there is no blacklist column to filter on.
For grouped output:
groups = lengths.group_metadata()
lengths.group_idx("t-cells")
group_counts = lengths.counts_array(groups="t-cells")
group_data = lengths.data_frame(groups="t-cells", value="fraction")
selected_groups = lengths.data_frame(
groups=["t-cells", "b-cells"],
with_length_range=(100, 220),
value="density",
keep_wide=True,
max_blacklisted_fraction=0.1,
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cfdnalab-0.2.0.tar.gz.
File metadata
- Download URL: cfdnalab-0.2.0.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
813bd4f3ab5468f8a85c487b510abe3b10197cf5d1697bde896d3341175ee96b
|
|
| MD5 |
433d7ff842d79209df3c9877034a5517
|
|
| BLAKE2b-256 |
414de39c57876ace85caa240d51174f18bd3568a0b557b2574e372838eb49ce6
|
File details
Details for the file cfdnalab-0.2.0-py3-none-any.whl.
File metadata
- Download URL: cfdnalab-0.2.0-py3-none-any.whl
- Upload date:
- Size: 30.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e534db55ad7e4cb8f5dfe06124e09f4f7df0390a422c238a198b1873daeeee03
|
|
| MD5 |
f171048e2e72ed27010f029447364b77
|
|
| BLAKE2b-256 |
0491ca0cc6eb5f8cd6823d87c843f8048ee41f90f730c42a7c4ae1b4377739dd
|