Skip to main content

Python helpers for loading and interacting with cfDNAlab output files

Project description

cfDNAlab | Python Loaders

Python helpers for loading cfDNAlab output files.

This package does not install or run the cfDNAlab command-line tool. The CLI is distributed separately as the Rust cfdna binary. Use this Python package after running cfDNAlab to load and analyze output files.

The first supported output types are midpoint and end-motif Zarr outputs: <prefix>.midpoint_profiles.zarr and <prefix>.end_motifs.zarr.


Install

These instructions only installs the Python loader package. To install the cfdna command-line tool, see the main repository.

Install with pip:

pip install cfdnalab

Install the current development version from GitHub:

pip install "cfdnalab @ git+https://github.com/BesenbacherLab/cfDNAlab.git#subdirectory=py-cfdnalab"

Load Midpoint Profiles

import cfdnalab as cfl

midpoints = cfl.read_midpoints("sample.midpoint_profiles.zarr")

Inspect Metadata

groups = midpoints.groups()
length_bins = midpoints.length_bins()
positions = midpoints.positions()

groups() returns group_idx, group_name, and eligible_intervals. length_bins() and positions() return the corresponding bin indices and half-open bp coordinates.

Extract One Profile

Use group_idx() and length_bin_idx() when selecting by names or bp lengths:

group_idx = midpoints.group_idx("LYL1")
length_bin_idx = midpoints.length_bin_idx(167)

profile = midpoints.data_frame_for_profile(
    group_idx=group_idx,
    length_bin_idx=length_bin_idx,
)

The returned data frame has one row per midpoint position bin.

Filter By Eligible Intervals

min_intervals = 100

for _, group in midpoints.groups().iterrows():
    if group["eligible_intervals"] < min_intervals:
        continue

    profile = midpoints.data_frame_for_profile(
        group_idx=group["group_idx"],
        length_bin_idx=0,
    )

Extract NumPy Arrays

profile = midpoints.array_for_profile(group_idx=0, length_bin_idx=0)
group_counts = midpoints.array_from_group_idx(group_idx=0)
length_counts = midpoints.array_from_length_bin(length_bin_idx=0)

array() loads the full 3D count tensor into RAM:

counts = midpoints.array()

Prefer the slice helpers when possible.


Load End-Motif Counts

import cfdnalab as cfl

ends = cfl.read_end_motifs("sample.end_motifs.zarr")

Storage Mode - Sparse or Dense

Start by checking whether the counts were stored as a dense matrix or sparse COO arrays.

ends.storage_mode()

For sparse output, sparse_coo_data_frame() is usually the easiest way to inspect or plot the non-zero motif counts. Use sparse_coo() or the sparse slice helpers when you want SciPy sparse matrices. Dense helpers require allow_densify=True on sparse stores so large sparse outputs are not accidentally expanded in memory.

For dense output, the dense_data_frame*() methods are usually the most convenient starting point. Use dense_counts_zarr_array() when you want the on-disk Zarr array and dense_counts_matrix() when you want the full NumPy matrix in memory.

sparse_coo_data_frame() is only available for sparse output.

Inspect End-Motif Metadata

motifs = ends.motif_metadata()
ends.has_motif("_AA")

read_end_motifs() returns a mode-specific object.

  • Windowed output has windows().
  • Grouped output has groups() and group_idx().
  • Global output has dense_counts_vec() and dense_data_frame().

Extract End-Motif Counts

motif_idx = ends.motif_idx("_AA")

motif_counts = ends.dense_data_frame_for_motif_idx(motif_idx)

Sparse output stays sparse unless you ask for dense arrays:

sparse_counts = ends.sparse_coo()
sparse_payload = ends.sparse_coo_data_frame()
motif_array = ends.dense_counts_for_motif("_AA", allow_densify=True)

For dense windowed output:

windows = ends.windows()
window_counts = ends.dense_data_frame_for_window(window_idx=0)

For dense grouped output:

groups = ends.groups()
group_counts = ends.dense_data_frame_for_group("t-cells")

For sparse stores, prefer sparse_coo(), sparse_coo_data_frame(), and the sparse slice helpers when working with large end-motif outputs. Use allow_densify=True only when the dense result is small enough to fit comfortably in memory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cfdnalab-0.1.0.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cfdnalab-0.1.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file cfdnalab-0.1.0.tar.gz.

File metadata

  • Download URL: cfdnalab-0.1.0.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cfdnalab-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d59b2d8f014545a23b4c46724a6865bff47bb3f417a905eade41ee6f6dd1d99f
MD5 0d0bf5178b4724a778489d9acb8c5afd
BLAKE2b-256 83931e1d9dd7b9dee78b922d6e10710f529f16059c837f0f49a94f97fde947e6

See more details on using hashes here.

File details

Details for the file cfdnalab-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: cfdnalab-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cfdnalab-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 08f11ad9bd38aed6fb2c673a406aba6133da00006032c103a86aece97297419c
MD5 ed9e163a62b0365b7264b622d939d890
BLAKE2b-256 b369d3042b3e59176c193dc4a9db11ee633cdabbbd29c8b5fe28c21a7a7a189b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page