A Python toolkit for copy number visualization and multi-sample comparison

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

CNVis

A lightweight Python toolkit for Copy Number Visualization and multi-sample comparison. Designed for publication-quality genome-wide plots with minimal dependencies.

Features

Binned coverage analysis from BedGraph, CSV, or BigWig files
Multi-sample coverage matrices at gene, chromosome arm, or fixed-bin resolution
Publication-quality genome-wide plots with chromosome-proportional layouts
Segment-based smoothing using ASCAT or other segmentation results
Built-in segmentation using PELT or CBS algorithms for quick exploration
Gap filtering with multiple methods (constant fill, neighbor interpolation, removal)
Bundled reference data including hg38 gap regions for easy filtering

Requirements

Python 3.7 or later
pandas, numpy, matplotlib, seaborn
bioframe, pyBigWig
ruptures (optional, for PELT segmentation)

Installation

pip install git+https://github.com/yelingqun/cnvis.git

Or download the zip from GitHub:

pip install cnvis-main.zip

Quick Start

import cnvis as cv
import pandas as pd

# Build a coverage matrix from multiple samples
matrix = cv.coverage_matrix_bins(
    input_files=['sample1.bedgraph', 'sample2.bedgraph'],
    names=['sample1', 'sample2'],
    bins_size=2_000_000
)

# Plot genome-wide coverage
genome_size = pd.read_csv('hg38_genome_size.tsv', sep='\t')
cv.plot_coverage(matrix, genome_size, y_column='sample1')

Workflow Guide

Workflow 1: Multi-Sample Coverage Matrix Analysis

This workflow creates binned coverage matrices for comparing multiple samples.

Step 1: Prepare Input Files

CNVis accepts coverage files in these formats:

BedGraph: chrom start end value (tab-separated)
CSV: Must contain chrom, start, end, and a value column
BigWig: Standard bigWig format (.bw)

Step 2: Build Coverage Matrix

import cnvis as cv

# List your coverage files and sample names
input_files = [
    'sample1.bedgraph',
    'sample2.bedgraph',
    'sample3.bedgraph'
]
names = ['sample1', 'sample2', 'sample3']

# Create coverage matrix with 2Mb bins
matrix = cv.coverage_matrix_bins(
    input_files=input_files,
    names=names,
    bins_size=2_000_000,      # 2Mb bins
    max_value=8,               # Clip outliers above 8
    normalize_median=True      # Normalize each sample to median=2
)

Alternative binning options:

# By chromosome arms (p/q arms)
matrix_arms = cv.coverage_matrix_arms(input_files, names, genome='hg38')

# By gene regions
genes_df = pd.read_csv('genes.bed', sep='\t')  # chrom, start, end, name
matrix_genes = cv.coverage_matrix_genes(input_files, names, genes=genes_df)

Step 3: Filter Genomic Gaps

Remove or interpolate coverage in problematic regions (centromeres, gaps, etc.):

# Load bundled hg38 gap regions (included with cnvis)
from importlib.resources import files
gap_file = files('cnvis.data').joinpath('hgTables_gap_hg38.tsv')
gap_df = pd.read_csv(gap_file, sep='\t')[['chrom', 'chromStart', 'chromEnd']]

# Or load your own gap file
# gap_df = pd.read_csv('hg38_gaps.tsv', sep='\t')[['chrom', 'chromStart', 'chromEnd']]

# Filter gaps with 100kb buffer
matrix_filtered = cv.filter_gaps(
    matrix,
    gap_df,
    buffer=100_000,        # Extend gap regions by 100kb
    method='neighbor',     # 'neighbor', 'constant', or 'remove'
    gap_value=2,           # Value to use if method='constant'
    window=3               # Window size for neighbor interpolation
)

Gap filtering methods:

'neighbor': Interpolate using neighboring bin values (recommended)
'constant': Fill with a fixed value (default: 2)
'remove': Drop gap bins entirely from the DataFrame

Step 4: Visualize Coverage

Single sample plot:

import pandas as pd

# Load genome size file (chrom, size columns)
genome_size = pd.read_csv('hg38_genome_size.tsv', sep='\t')

# Simple plot without color mapping
cv.plot_coverage_points(
    matrix_filtered,
    genome_size,
    y_column='sample1',
    ylim=(0, 4.5),
    alpha=0.3,
    ylabel='Copy Number'
)

# Or with copy number color mapping
palette = cv.get_cn_palette()  # Get default CN color palette
matrix_filtered['color'] = matrix_filtered['sample1'].apply(cv.categorize_cn_color)

cv.plot_coverage_points(
    matrix_filtered,
    genome_size,
    y_column='sample1',
    hue_column='color',
    palette=palette,
    ylim=(0, 4.5),
    alpha=0.3,
    ylabel='Copy Number'
)

Multi-sample comparison:

cv.plot_coverage_multi(
    matrix_filtered,
    genome_size,
    y_columns=['sample1', 'sample2', 'sample3'],
    ylabels=['Sample 1', 'Sample 2', 'Sample 3'],
    chrom_column='chrom',
    x1_column='start',
    hue_column='color',
    palette=palette,
    ylim=(0, 4.5),
    alpha=0.3,
    showX=False
)

Plot specific chromosomes:

cv.plot_coverage_multi(
    matrix_filtered,
    genome_size,
    y_columns=['sample1', 'sample2'],
    chrom=['chr1', 'chr2', 'chr3'],  # Only these chromosomes
    chrom_column='chrom',
    x1_column='start',
    hue_column='color',
    palette=palette
)

Workflow 2: Segment-Based Smoothing with ASCAT

This workflow integrates ASCAT segmentation results to smooth coverage data.

Step 1: Load and Smooth Coverage

import cnvis as cv
import pandas as pd

# Load coverage data
cov = cv.load_coverage_file('sample.csv')

# Load segment data
segment = pd.read_csv('sample.segments.txt', sep='\t')

# Smooth toward segment medians
# smooth=0.9 means 90% toward segment median, 10% original value
cov = cv.smooth_with_segments(
    cov,                         # Coverage DataFrame
    segment,                     # Segment DataFrame
    column='value',              # Input column name
    result_column='value_smoothed',  # Output column name
    smooth=0.9                   # Smoothing factor (0-1)
)

Step 2: Filter with Blood/Normal Control (Optional)

# Load blood/normal coverage for filtering
blood_cov = cv.load_coverage_file('blood_sample.csv')

# Filter out bins with abnormal blood coverage
cov_filtered = cv.filter_cov(
    cov,
    blood_cov,
    value_column='value',
    chrom_column='chrom'
)

Step 3: Convert to Copy Number and Assign Colors

# Convert normalized coverage to copy number (diploid = 2)
cov_filtered['cn'] = (cov_filtered['value_smoothed'] * 2).clip(upper=8)

# Calculate segment median for color assignment
cov_filtered['segment_median'] = cov_filtered.groupby('segment')['cn'].transform('median')

# Assign colors based on copy number state
cov_filtered['color'] = cov_filtered['segment_median'].apply(cv.categorize_cn_color)

Step 4: Plot Smoothed Coverage

palette = cv.get_cn_palette()  # Get default CN color palette

cv.plot_coverage(
    cov_filtered,
    genome_size,
    y_column='cn',
    s=1,                        # Point size
    ylim=(0, 4.5),
    alpha=0.3,
    hue_column='color',
    palette=palette,
    figsize=(5, 0.8),
    ylabel='Copy Number'
)

Workflow 3: Quick Segmentation with Built-in Algorithms

For quick exploration without external tools like ASCAT, CNVis provides built-in segmentation.

Step 1: Load and Normalize Coverage

import cnvis as cv

# Load coverage data
cov = cv.load_coverage_file('sample.bedgraph')

# Normalize (clip outliers, normalize to median=1)
cov = cv.normalize_coverage(cov, max_value=8, normalize_median=True)

Step 2: Run Segmentation

# PELT algorithm (fast, recommended for exploration)
segments = cv.segment_coverage(cov, method='pelt', penalty=3)

# Or CBS algorithm (classic CNV method, slower but well-established)
segments = cv.segment_coverage(cov, method='cbs', alpha=0.01)

Method comparison:

'pelt': Fast change-point detection using the ruptures library. Good for quick exploration.
'cbs': Circular Binary Segmentation, the classic algorithm for array CGH data (Olshen et al., 2004). Uses permutation tests for significance.

Common parameters:

penalty: For PELT, higher values = fewer breakpoints (default: 3)
alpha: For CBS, significance level (default: 0.01)
min_size: Minimum segment size in bins (default: 5)
merge_segments: Merge adjacent segments that aren't statistically different (default: True)

Step 3: Visualize Segments

import pandas as pd

genome_size = pd.read_csv('hg38_genome_size.tsv', sep='\t')

# Plot segments as horizontal lines
cv.plot_segments(segments, genome_size, y_column='cn', ylim=(0, 4.5))

API Reference

Coverage Processing Functions

Function	Description
`normalize_coverage(track, max_value=8, normalize_median=True, target_median=1.0)`	Clip and/or normalize coverage values
`filter_gaps(df, gap, buffer=500_000, method='constant')`	Filter genomic gap regions (methods: 'constant', 'neighbor', 'remove')
`filter_cov(cov, blood_cov)`	Filter using control sample
`smooth_with_segments(cov, segment, smooth=0.9)`	Segment-based smoothing
`segment_coverage(cov, method='pelt')`	Segment coverage using PELT or CBS algorithm
`merge_similar_segments(segments, p_threshold=0.05)`	Merge adjacent segments that aren't statistically different

Coverage Matrix Functions

Function	Description
`coverage_matrix_bins(input_files, names, bins_size=2_000_000)`	Create matrix with fixed-size bins
`coverage_matrix_arms(input_files, names, genome='hg38')`	Create matrix by chromosome arms
`coverage_matrix_genes(input_files, names, genes)`	Create matrix by gene regions
`coverage_by_bins(input_file, name, bins)`	Process single sample
`matrix2comut(matrix, low=1.25, high=2.75)`	Convert to CoMut format

Plotting Functions

Function	Description
`plot_coverage(df, genome_size, y_column, ...)`	Single-sample genome-wide plot (main function)
`plot_coverage_points(df, genome_size, y_column, ...)`	Scatter plot wrapper (simplified API)
`plot_coverage_lines(df, genome_size, y_column, ...)`	Line segment wrapper (simplified API)
`plot_segments(segments, genome_size, y_column='cn', ...)`	Plot segmentation results as horizontal lines
`plot_coverage_multi(df, genome_size, y_columns, ...)`	Multi-sample stacked plots
`categorize_cn_color(value)`	Map CN value to color category
`get_cn_palette()`	Get default CN color palette
`extract_highlighted_coverage(df, highlight_df, ...)`	Extract coverage from highlighted regions

Utility Functions

Function	Description
`load_coverage_file(input_file, chrom_col, start_col, end_col, value_col)`	Load BedGraph/CSV/TSV/BigWig file
`genome_range(version='GRCh38')`	Get chromosome ranges
`genome_bins(coord_df, bin_size)`	Generate genomic bins

Bundled Reference Data

CNVis includes reference data files for hg38:

from importlib.resources import files

# hg38 gap regions (centromeres, telomeres, scaffold gaps)
gap_file = files('cnvis.data').joinpath('hgTables_gap_hg38.tsv')

# GRCh38 chromosome sizes
genome_file = files('cnvis.data').joinpath('GRCh38.genome.size.tsv')

Plot Customization

Styling Options

# Style: spine separators between chromosomes
cv.plot_coverage_multi(df, genome_size, y_columns, style='spine')

# Style: alternating background colors
cv.plot_coverage_multi(
    df, genome_size, y_columns,
    style='facecolor',
    facecolor_odd='#e6f2ff',
    facecolor_even='#ffffff'
)

Common Parameters

Parameter	Description
`ylim`	Y-axis limits, e.g., `(0, 4.5)`
`alpha`	Point transparency (0-1)
`s`	Point size
`figsize`	Figure size as `(width, height)`
`showX`	Show x-axis labels
`ylabel`	Y-axis label
`highlight_df`	DataFrame of regions to highlight
`highlight_color`	Color for highlighted regions

Plot Type Selection

CNVis provides wrapper functions for common plot types:

# Scatter plot (points) - best for binned coverage data
cv.plot_coverage_points(df, genome_size, y_column='value', alpha=0.3)

# Line segments - best for segment-level data with start/end coordinates
cv.plot_coverage_lines(df, genome_size, y_column='value', x2_column='end')

# Full control - use the main function directly
cv.plot_coverage(df, genome_size, y_column='value', x2_column='end', ...)

Example Notebooks

See the notebooks/ directory for complete examples:

segmentation_algorithms_explained.ipynb - In-depth guide to PELT and CBS algorithms
test_segments.ipynb - Quick segmentation usage examples
test_coverage_matrix_2m.ipynb - Multi-sample coverage analysis
test_coverage_plot_smoothed.ipynb - Segment-based smoothing
test_coverage_matrix_plot_hic_vs_wgs.ipynb - HiC vs WGS comparison
test_pacbio_coverage_plot.ipynb - Long-read coverage plotting

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.2.0

Mar 25, 2026

This version

0.1.0

Jan 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cnvis-0.1.0.tar.gz (40.1 kB view details)

Uploaded Jan 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cnvis-0.1.0-py3-none-any.whl (36.3 kB view details)

Uploaded Jan 28, 2026 Python 3

File details

Details for the file cnvis-0.1.0.tar.gz.

File metadata

Download URL: cnvis-0.1.0.tar.gz
Upload date: Jan 28, 2026
Size: 40.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for cnvis-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`07bc5df8b8fadf606181b9aade6b4041574b03bbb2631a476f8ee665de25b2cb`
MD5	`e325ce27c023c9411cc87570d402c2d3`
BLAKE2b-256	`99bd60d940bb462770d98684e7eb506a5cf59d4bff322085a7d22d26310e40c4`

See more details on using hashes here.

File details

Details for the file cnvis-0.1.0-py3-none-any.whl.

File metadata

Download URL: cnvis-0.1.0-py3-none-any.whl
Upload date: Jan 28, 2026
Size: 36.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for cnvis-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ea5e20df60def7301d0af32a315baab649b323eaa8ddd32e19e5c8c067822d00`
MD5	`468f001bd924e06f83c479e92b0510aa`
BLAKE2b-256	`58fc1387c2013f21fd00021b533fb1460bd0c3f3623d4f3290375f3ee567fed1`

See more details on using hashes here.

cnvis 0.1.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

CNVis

Features

Requirements

Installation

Quick Start

Workflow Guide

Workflow 1: Multi-Sample Coverage Matrix Analysis

Step 1: Prepare Input Files

Step 2: Build Coverage Matrix

Step 3: Filter Genomic Gaps

Step 4: Visualize Coverage

Workflow 2: Segment-Based Smoothing with ASCAT

Step 1: Load and Smooth Coverage

Step 2: Filter with Blood/Normal Control (Optional)

Step 3: Convert to Copy Number and Assign Colors

Step 4: Plot Smoothed Coverage

Workflow 3: Quick Segmentation with Built-in Algorithms

Step 1: Load and Normalize Coverage

Step 2: Run Segmentation

Step 3: Visualize Segments

API Reference

Coverage Processing Functions

Coverage Matrix Functions

Plotting Functions

Utility Functions

Bundled Reference Data

Plot Customization

Styling Options

Common Parameters

Plot Type Selection

Example Notebooks

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes