Instant file previews for genomics data

These details have not been verified by PyPI

Project links

Project description

peek-bio

Instant file previews for genomics data. One command, any format.

# via bioconda (recommended)
conda install -c bioconda peek-bio

# or via pip
pip install peek-bio

peek demo

What it does

Point peek at a file and get a structured summary: row counts, column types, quality scores, variant stats, mapping rates, QC warnings. No scripts, no notebooks, no googling command flags.

$ peek deseq2_results.csv

 deseq2_results.csv — >10,553 x 7 (CSV, comma-separated)
 ────────────────────────────────────────────────────────────────────
 Columns:
                   str    0610005C13Rik, 0610009B22Rik, ...  (1,000 unique)
   baseMean        float  3.92 … 1983.92  (median: 25.32, mean: 59.32)  ⡇⡀⡀⡀⡀⡀⡀⡀⡀⡀
   log2FoldChange  float  -3.29 … 3.60  (median: -0.02, mean: -0.04)    ⡀⡀⡀⡀⡀⡇⡄⡀⡀⡀⡀⡀
   lfcSE           float  0.11 … 1.23  (median: 0.35, mean: 0.40)       ⡄⡇⡆⡄⡄⡄⡀⡀⡀⡀⡀⡀
   stat            float  -5.94 … 8.10  (median: -0.06, mean: -0.11)    ⡀⡀⡀⡀⡇⡆⡄⡀⡀⡀⡀
   pvalue          float  5.46e-16 … 1.00  (median: 0.37, mean: 0.45)   ⡇⡄⡄⡄⡄⡄⡄⡀⡄⡄⡄⡄
   padj            float  3.42e-13 … 1.00  (median: 0.95, mean: 0.81)   ⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡇

 Missing:  pvalue (1)

$ peek NA12878.bam

 NA12878.bam — 61,614 reads (BAM, indexed)
 ────────────────────────────────────────────────────────────────────
 Reference:  3366 sequences, 3.2 Gb  [GRCh38 (with alts)]
 Reads:  60,749 mapped (98.6%), 865 unmapped
 Flags:  0.1% duplicates, 1.5% supplementary
 Paired:  yes (2×250 bp)
 Insert size:  mean 449  median 428  range 100–999  ⡀⡀⡀⡀⡆⡇⡄⡄⡄⡀⡀⡀
 Read groups:  3  (NA12878, NA12878, NA12878)
 Sort order:  coordinate
 Programs:  bwamem, MarkDuplicates, GATK ApplyBQSR
 MAPQ:  mean 55.3  median 60  ⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡇

$ peek ERR188273_chrX_1.fq.gz

 ERR188273_chrX_1.fq.gz — 30,531 reads, 2.3 Mb (FASTQ, Phred+33)
 ────────────────────────────────────────────────────────────────────
 Read length:  all 75 bp
 Quality:  mean Q36.7  median Q38  range Q2–Q41  ⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡆⡇
 GC content:  48.9%

$ peek clinvar.vcf.gz

 clinvar.vcf.gz — 4,403,650 variants (VCF)
 ────────────────────────────────────────────────────────────────────
 Variants:  4,103,565 snps, 93,659 insertions, 194,377 deletions, 12,049 complexes
 Ts/Tv:  1.69
 FILTER:  4,403,650 PASS
 Chroms:  32 total — top: 1 (398,195), 2 (384,641), 17 (265,676)

$ peek filtered_feature_bc_matrix/matrix.mtx.gz

 matrix.mtx.gz (12.3 MB) — 8,421 cells x 33,538 genes (Matrix Market, coordinate, integer)
 ────────────────────────────────────────────────────────────────────
 Non-zero:  17,438,362 entries (93.8% sparse)
 Cells:  8,421
 Features:  33,538
 Mean nnz/cell:  2,071
 Feature types:  33,538 Gene Expression
 Companions:  barcodes.tsv.gz, features.tsv.gz

Supported formats

Core (no extra dependencies):

Format	Extensions
CSV/TSV	`.csv`, `.tsv`, `.txt`
BED	`.bed`, `.narrowPeak`, `.broadPeak`, `.bedGraph`
FASTA	`.fa`, `.fasta`
FASTQ	`.fq`, `.fastq`
VCF	`.vcf`, `.vcf.gz`
MTX	`.mtx`
GTF/GFF	`.gtf`, `.gff`, `.gff3`

Optional (install what you need):

Format	Extensions	Install
SAM/BAM/CRAM	`.sam`, `.bam`, `.cram`	`pip install peek-bio[bam]`
Excel	`.xlsx`, `.xls`	`pip install peek-bio[excel]`
BigWig	`.bw`, `.bigwig`	`pip install peek-bio[bigwig]`
H5AD	`.h5ad`	`pip install peek-bio[h5ad]`

Or install everything: pip install peek-bio[all]

Files with non-standard extensions (or no extension at all) are detected automatically from their content.

Directory scan

Point peek at a folder to get an instant inventory of all genomics files:

$ peek data/

 data/ — 30 genomics files, 3.1 GB
 ────────────────────────────────────────────────────────────────────
   1 FASTA  all_ref_sva.fa
   11 BAM/SAM/CRAM  (3 indexed)
   1 VCF  candidate_EOPC_variants.vcf.gz
   4 BED  CEBPG.bed, ENCFF363RKC.bed, ...
   3 GTF/GFF  fimo_HP.gff, fimo_cobound.gff, gencode.v38.basic.annotation.gtf.gz
   1 BigWig  k562_MNase.bw
   2 H5AD  neurips_bmmc.h5ad, pbmc68k.h5ad
   2 Excel  Oct4_RS-matrix_Rep1-Apr-2021.xlsx, nature_genetics_supp.xlsx
   5 CSV/TSV

Detects FASTQ pairs (R1/R2), indexed BAMs, and skips hidden files.

Paired FASTQ comparison

Give peek two FASTQ files and it automatically compares them side by side:

$ peek sample_R1.fq.gz sample_R2.fq.gz

 Paired FASTQ Comparison
 ────────────────────────────────────────────────────────────────────
                 sample_R1.fq.gz         sample_R2.fq.gz
          Reads  1,204,881               1,204,881               ✓
       Total bp  90.4 Mb                 90.4 Mb
    Read length  75 bp                   75 bp                   ✓
   Mean quality  Q36.7                   Q34.2                   ✓
     GC content  48.9%                   49.1%                   ✓
       Encoding  Phred+33                Phred+33                ✓

Mismatched read counts (broken pairing) are flagged with a QC warning.

Explain mode

New to bioinformatics? Add --explain for plain-English annotations of every metric:

$ peek --explain variants.vcf

 Ts/Tv:  1.85
   ↳ Transition/transversion ratio. Transitions (A<>G, C<>T) are chemically
     favored over transversions (all other changes). Whole-genome ~2.0, exome ~2.8.
     A low ratio can indicate sequencing artifacts or contamination.

 MAPQ:  mean 55.3  median 60
   ↳ Mapping quality: confidence that each read is aligned to the correct position.
     60 = near-certain. 0 = equally likely at multiple locations. Below 20 = ambiguous.

QC warnings

peek flags common issues automatically:

Unusual GC content (outside 25-65%)
High N content in assemblies (>20%)
Low mean base quality in FASTQ (<Q20)
Adapter contamination in FASTQ (>5%)
Low mapping rate in BAM/SAM (<80%)
Low MAPQ scores (<20 mean)
High duplicate rate (>30%)
Ts/Tv ratio out of range in VCF
Low genotype rate in multi-sample VCF (<90%)
No gene features or missing gene_id in GTF
Single-chromosome GTF (possible subset)
Columns with >50% missing data
Mixed-type columns (numbers and strings mixed together)

Usage

peek FILE [FILE ...]          # preview one or more files
peek DIRECTORY                # scan a folder for genomics files
peek -r DIRECTORY             # scan recursively (include subdirectories)
peek R1.fq R2.fq              # compare paired FASTQ files
peek https://example.com/f.vcf  # preview a file from a URL
peek --explain FILE           # add plain-English explanations
peek --head 20 FILE           # show 20 preview rows instead of 5
peek --no-color FILE          # plain text output (no ANSI colors)
peek --formats                # list all supported formats + install status
peek --version                # print version

Compressed files (.gz) are handled transparently. URLs are downloaded to a temp file and cleaned up automatically.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

May 20, 2026

0.1.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peek_bio-0.2.0.tar.gz (103.5 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

peek_bio-0.2.0-py3-none-any.whl (85.9 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file peek_bio-0.2.0.tar.gz.

File metadata

Download URL: peek_bio-0.2.0.tar.gz
Upload date: May 20, 2026
Size: 103.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for peek_bio-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`d71096cd0c9ef1d887670cf18d45c902f70ce38507750b52545a22c530565d9f`
MD5	`00510e855454deb3506c8618a97ffb9e`
BLAKE2b-256	`5664c5eeb41c4b00918c1bd65551f3bdb5aa830783218f9639c3fd480b632419`

See more details on using hashes here.

File details

Details for the file peek_bio-0.2.0-py3-none-any.whl.

File metadata

Download URL: peek_bio-0.2.0-py3-none-any.whl
Upload date: May 20, 2026
Size: 85.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for peek_bio-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`57a77b5d65f7299d8fbc4adbcc52f0db566553ec7cb8e024ac5019a18dce1c82`
MD5	`28dca48deba2809e24cdff0a1eb11ad2`
BLAKE2b-256	`038b9c05e89c149720d45a388ba3eb0294c29499059b23570436356cfa6e5b00`

See more details on using hashes here.

peek-bio 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

peek-bio

What it does

Supported formats

Directory scan

Paired FASTQ comparison

Explain mode

QC warnings

Usage

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes