Instant file previews for genomics data
Project description
peek-bio
Instant file previews for genomics data. One command, any format.
# via bioconda (recommended)
conda install -c bioconda peek-bio
# or via pip
pip install peek-bio
What it does
Point peek at a file and get a structured summary: row counts, column types,
quality scores, variant stats, mapping rates, QC warnings. No scripts, no
notebooks, no googling command flags.
$ peek deseq2_results.csv
deseq2_results.csv — >10,553 x 7 (CSV, comma-separated)
────────────────────────────────────────────────────────────────────
Columns:
str 0610005C13Rik, 0610009B22Rik, ... (1,000 unique)
baseMean float 3.92 … 1983.92 (median: 25.32, mean: 59.32) ⡇⡀⡀⡀⡀⡀⡀⡀⡀⡀
log2FoldChange float -3.29 … 3.60 (median: -0.02, mean: -0.04) ⡀⡀⡀⡀⡀⡇⡄⡀⡀⡀⡀⡀
lfcSE float 0.11 … 1.23 (median: 0.35, mean: 0.40) ⡄⡇⡆⡄⡄⡄⡀⡀⡀⡀⡀⡀
stat float -5.94 … 8.10 (median: -0.06, mean: -0.11) ⡀⡀⡀⡀⡇⡆⡄⡀⡀⡀⡀
pvalue float 5.46e-16 … 1.00 (median: 0.37, mean: 0.45) ⡇⡄⡄⡄⡄⡄⡄⡀⡄⡄⡄⡄
padj float 3.42e-13 … 1.00 (median: 0.95, mean: 0.81) ⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡇
Missing: pvalue (1)
$ peek NA12878.bam
NA12878.bam — 61,614 reads (BAM, indexed)
────────────────────────────────────────────────────────────────────
Reference: 3366 sequences, 3.2 Gb [GRCh38 (with alts)]
Reads: 60,749 mapped (98.6%), 865 unmapped
Flags: 0.1% duplicates, 1.5% supplementary
Paired: yes (2×250 bp)
Insert size: mean 449 median 428 range 100–999 ⡀⡀⡀⡀⡆⡇⡄⡄⡄⡀⡀⡀
Read groups: 3 (NA12878, NA12878, NA12878)
Sort order: coordinate
Programs: bwamem, MarkDuplicates, GATK ApplyBQSR
MAPQ: mean 55.3 median 60 ⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡇
$ peek ERR188273_chrX_1.fq.gz
ERR188273_chrX_1.fq.gz — 30,531 reads, 2.3 Mb (FASTQ, Phred+33)
────────────────────────────────────────────────────────────────────
Read length: all 75 bp
Quality: mean Q36.7 median Q38 range Q2–Q41 ⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡆⡇
GC content: 48.9%
$ peek clinvar.vcf.gz
clinvar.vcf.gz — 4,403,650 variants (VCF)
────────────────────────────────────────────────────────────────────
Variants: 4,103,565 snps, 93,659 insertions, 194,377 deletions, 12,049 complexes
Ts/Tv: 1.69
FILTER: 4,403,650 PASS
Chroms: 32 total — top: 1 (398,195), 2 (384,641), 17 (265,676)
$ peek filtered_feature_bc_matrix/matrix.mtx.gz
matrix.mtx.gz (12.3 MB) — 8,421 cells x 33,538 genes (Matrix Market, coordinate, integer)
────────────────────────────────────────────────────────────────────
Non-zero: 17,438,362 entries (93.8% sparse)
Cells: 8,421
Features: 33,538
Mean nnz/cell: 2,071
Feature types: 33,538 Gene Expression
Companions: barcodes.tsv.gz, features.tsv.gz
Supported formats
Core (no extra dependencies):
| Format | Extensions |
|---|---|
| CSV/TSV | .csv, .tsv, .txt |
| BED | .bed, .narrowPeak, .broadPeak, .bedGraph |
| FASTA | .fa, .fasta |
| FASTQ | .fq, .fastq |
| VCF | .vcf, .vcf.gz |
| MTX | .mtx |
| GTF/GFF | .gtf, .gff, .gff3 |
Optional (install what you need):
| Format | Extensions | Install |
|---|---|---|
| SAM/BAM/CRAM | .sam, .bam, .cram |
pip install peek-bio[bam] |
| Excel | .xlsx, .xls |
pip install peek-bio[excel] |
| BigWig | .bw, .bigwig |
pip install peek-bio[bigwig] |
| H5AD | .h5ad |
pip install peek-bio[h5ad] |
Or install everything: pip install peek-bio[all]
Files with non-standard extensions (or no extension at all) are detected automatically from their content.
Directory scan
Point peek at a folder to get an instant inventory of all genomics files:
$ peek data/
data/ — 30 genomics files, 3.1 GB
────────────────────────────────────────────────────────────────────
1 FASTA all_ref_sva.fa
11 BAM/SAM/CRAM (3 indexed)
1 VCF candidate_EOPC_variants.vcf.gz
4 BED CEBPG.bed, ENCFF363RKC.bed, ...
3 GTF/GFF fimo_HP.gff, fimo_cobound.gff, gencode.v38.basic.annotation.gtf.gz
1 BigWig k562_MNase.bw
2 H5AD neurips_bmmc.h5ad, pbmc68k.h5ad
2 Excel Oct4_RS-matrix_Rep1-Apr-2021.xlsx, nature_genetics_supp.xlsx
5 CSV/TSV
Detects FASTQ pairs (R1/R2), indexed BAMs, and skips hidden files.
Paired FASTQ comparison
Give peek two FASTQ files and it automatically compares them side by side:
$ peek sample_R1.fq.gz sample_R2.fq.gz
Paired FASTQ Comparison
────────────────────────────────────────────────────────────────────
sample_R1.fq.gz sample_R2.fq.gz
Reads 1,204,881 1,204,881 ✓
Total bp 90.4 Mb 90.4 Mb
Read length 75 bp 75 bp ✓
Mean quality Q36.7 Q34.2 ✓
GC content 48.9% 49.1% ✓
Encoding Phred+33 Phred+33 ✓
Mismatched read counts (broken pairing) are flagged with a QC warning.
Explain mode
New to bioinformatics? Add --explain for plain-English annotations of every metric:
$ peek --explain variants.vcf
Ts/Tv: 1.85
↳ Transition/transversion ratio. Transitions (A<>G, C<>T) are chemically
favored over transversions (all other changes). Whole-genome ~2.0, exome ~2.8.
A low ratio can indicate sequencing artifacts or contamination.
MAPQ: mean 55.3 median 60
↳ Mapping quality: confidence that each read is aligned to the correct position.
60 = near-certain. 0 = equally likely at multiple locations. Below 20 = ambiguous.
QC warnings
peek flags common issues automatically:
- Unusual GC content (outside 25-65%)
- High N content in assemblies (>20%)
- Low mean base quality in FASTQ (<Q20)
- Adapter contamination in FASTQ (>5%)
- Low mapping rate in BAM/SAM (<80%)
- Low MAPQ scores (<20 mean)
- High duplicate rate (>30%)
- Ts/Tv ratio out of range in VCF
- Low genotype rate in multi-sample VCF (<90%)
- No gene features or missing gene_id in GTF
- Single-chromosome GTF (possible subset)
- Columns with >50% missing data
- Mixed-type columns (numbers and strings mixed together)
Usage
peek FILE [FILE ...] # preview one or more files
peek DIRECTORY # scan a folder for genomics files
peek -r DIRECTORY # scan recursively (include subdirectories)
peek R1.fq R2.fq # compare paired FASTQ files
peek https://example.com/f.vcf # preview a file from a URL
peek --explain FILE # add plain-English explanations
peek --head 20 FILE # show 20 preview rows instead of 5
peek --no-color FILE # plain text output (no ANSI colors)
peek --formats # list all supported formats + install status
peek --version # print version
Compressed files (.gz) are handled transparently. URLs are downloaded to a temp file and cleaned up automatically.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file peek_bio-0.2.0.tar.gz.
File metadata
- Download URL: peek_bio-0.2.0.tar.gz
- Upload date:
- Size: 103.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d71096cd0c9ef1d887670cf18d45c902f70ce38507750b52545a22c530565d9f
|
|
| MD5 |
00510e855454deb3506c8618a97ffb9e
|
|
| BLAKE2b-256 |
5664c5eeb41c4b00918c1bd65551f3bdb5aa830783218f9639c3fd480b632419
|
File details
Details for the file peek_bio-0.2.0-py3-none-any.whl.
File metadata
- Download URL: peek_bio-0.2.0-py3-none-any.whl
- Upload date:
- Size: 85.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
57a77b5d65f7299d8fbc4adbcc52f0db566553ec7cb8e024ac5019a18dce1c82
|
|
| MD5 |
28dca48deba2809e24cdff0a1eb11ad2
|
|
| BLAKE2b-256 |
038b9c05e89c149720d45a388ba3eb0294c29499059b23570436356cfa6e5b00
|