Synthetic Hi-C / Micro-C / 3C triplet FASTQ benchmark generator and .pairs recovery analyser.

These details have not been verified by PyPI

Project description

bench3c

bench3c is a synthetic benchmark generator for 3C-derived sequencing workflows. It generates controlled Hi-C-like and Micro-C-like paired-end FASTQ reads containing known triplet structures, then evaluates whether a mapping / splitting / .pairs reconstruction pipeline recovers the expected fragments.

The benchmark is mainly designed to test preprocessing tools for chimeric or multiplex reads in Hi-C, Micro-C, Pore-C-like or split-read 3C workflows.

Model

The benchmark encodes a known triplet of genomic fragments:

R1: AAAAAAABBBBBBBBB
R2: CCCCCCCCCCCCCCCC

The read name stores the true genomic coordinates:

@chrA-startA-endA:chrB-startB-endB::chrC-startC-endC

This allows downstream analysis to compare the expected fragment lengths with the observed alignments recovered in .pairsam files.

Modes

bench3c has three main modes:

--hic: generate Hi-C-like triplet reads from a digested genome.
--microc: generate Micro-C-like triplet reads directly from a FASTA.
--analyse: analyse a .pairsam.gz or .pairsam file and compare recovered alignments to the encoded truth.

Installation

From source:

git clone <repo-url>
cd <repo>
uv sync
uv run bench3c --help

Or with pip after packaging:

pip install bench3c
bench3c --help

Hi-C simulation

Generate Hi-C-like paired-end FASTQ reads:

bench3c --hic \
  --fasta genome.fa \
  --site GATC \
  --out bench/hic_sim \
  --number-reads-pairs 100000 \
  --read-len 150 \
  --min-piece 50 \
  --max-jump 300

If --fasta is not provided, bench3c can generate a random FASTA. If --digested is not provided, the genome is digested internally using --site.

Typical outputs:

bench/hic_sim_R1.fq
bench/hic_sim_R2.fq

Micro-C simulation

Generate Micro-C-like paired-end FASTQ reads:

bench3c --microc \
  --fasta genome.fa \
  --out bench/microc_sim \
  --number-reads-pairs 100000 \
  --read-len 150 \
  --min-piece 50

Add non-chimeric pairs:

bench3c --microc \
  --fasta genome.fa \
  --out bench/microc_mixed \
  --number-reads-pairs 100000 \
  --read-len 150 \
  --prop-nonchimeric 0.2

Analysis mode

Analyse a .pairsam.gz or .pairsam file after mapping and reconstruction:

bench3c --analyse \
  --pairs output.pairs.gz \
  --read-len 150 \
  --analyse-out-dir benchmark_results \
  --condition my_pipeline

The analyser expects read names following the truth-encoding format:

chrA-startA-endA:chrB-startB-endB::chrC-startC-endC

It also expects recoverable alignment information, typically through sam1 and sam2 columns in the .pairs / .pairsam file.

Analysis outputs

The analysis mode writes summary tables and plots, including:

<condition>_read_recovery.tsv
<condition>_problem_fragments.tsv
<condition>_problem_summary.tsv
<condition>_cut_summary.tsv
<condition>_histogram.pdf
<condition>_tolerance_curve.pdf

Depending on the current version, additional outputs may include chimeric-size summaries and chimeric-specific histograms.

Typical benchmark workflow

# 1. Generate synthetic reads
bench3c --microc \
  --fasta genome.fa \
  --out bench/microc \
  --number-reads-pairs 10000 \
  --read-len 150

# 2. Map the reads with your mapper
bwa mem -SP genome.fa bench/microc_R1.fq bench/microc_R2.fq > bench/microc.sam

# 3. Convert the mappings to .pairs with your pipeline

# 4. Analyse fragment recovery
bench3c --analyse \
  --pairs bench/output.pairs.gz \
  --read-len 150 \
  --analyse-out-dir bench/results \
  --condition my_pipeline

Interpretation

A perfect read is a read for which all expected fragments are recovered with no extra observed fragment.

Common failure classes:

missing: an expected fragment was not recovered.
too_short: the observed fragment is shorter than the truth.
too_long: the observed fragment is longer than the truth.
over_split: extra observed fragments were recovered.
under_split_or_missing: one or more truth fragments were not recovered.

Limitations

bench3c is a controlled synthetic benchmark. It does not fully model all experimental biases of real Hi-C or Micro-C libraries, such as PCR duplicates, GC bias, mappability bias, restriction efficiency, ligation bias, base-quality degradation, optical duplicates, or complex multi-mapping.

It is intended to test whether a pipeline can recover known chimeric or multiplex structures under controlled conditions.

License

AGPL-3.0-or-later.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.4

May 21, 2026

0.0.3

May 21, 2026

0.0.2

May 21, 2026

0.0.1

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bench3c-0.0.4.tar.gz (43.2 kB view details)

Uploaded May 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bench3c-0.0.4-py3-none-any.whl (44.6 kB view details)

Uploaded May 21, 2026 Python 3

File details

Details for the file bench3c-0.0.4.tar.gz.

File metadata

Download URL: bench3c-0.0.4.tar.gz
Upload date: May 21, 2026
Size: 43.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for bench3c-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`5da894c48c43fa5cd0cc40b96865d76e5d7cf3210a927ca748d2f1c4ad4d92b9`
MD5	`4bc1eabc32f6e238f82293a287cc9b4b`
BLAKE2b-256	`a56880386a71e7e39fd8a4e7b1e9193ab2cb34eb4d421295f5d7b0336e583f96`

See more details on using hashes here.

File details

Details for the file bench3c-0.0.4-py3-none-any.whl.

File metadata

Download URL: bench3c-0.0.4-py3-none-any.whl
Upload date: May 21, 2026
Size: 44.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for bench3c-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fb3a4702597e156920218c867043ac28ecba03f53e591d22c733aa02ab6d71a3`
MD5	`3e3b68abe854effafb25ee4b05829ff8`
BLAKE2b-256	`aae4d75176643449469eb40df3198934d055fa718c90ff90be375b8803a49650`

See more details on using hashes here.

bench3c 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

bench3c

Model

Modes

Installation

Hi-C simulation

Micro-C simulation

Analysis mode

Analysis outputs

Typical benchmark workflow

Interpretation

Limitations

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes