Instant file previews for genomics data
Project description
peek-bio
Instant file previews for genomics data. One command, any format.
pip install peek-bio
What it does
Point peek at a file and get a structured summary: row counts, column types,
quality scores, variant stats, mapping rates, QC warnings. No scripts, no
notebooks, no googling command flags.
$ peek deseq2_results.csv
deseq2_results.csv — >10,553 x 7 (CSV, comma-separated)
────────────────────────────────────────────────────────────────────
Columns:
str 0610005C13Rik, 0610009B22Rik, ... (1,000 unique)
baseMean float 3.92 … 1983.92 (median: 25.32, mean: 59.32) ⡇⡀⡀⡀⡀⡀⡀⡀⡀⡀
log2FoldChange float -3.29 … 3.60 (median: -0.02, mean: -0.04) ⡀⡀⡀⡀⡀⡇⡄⡀⡀⡀⡀⡀
lfcSE float 0.11 … 1.23 (median: 0.35, mean: 0.40) ⡄⡇⡆⡄⡄⡄⡀⡀⡀⡀⡀⡀
stat float -5.94 … 8.10 (median: -0.06, mean: -0.11) ⡀⡀⡀⡀⡇⡆⡄⡀⡀⡀⡀
pvalue float 5.46e-16 … 1.00 (median: 0.37, mean: 0.45) ⡇⡄⡄⡄⡄⡄⡄⡀⡄⡄⡄⡄
padj float 3.42e-13 … 1.00 (median: 0.95, mean: 0.81) ⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡇
Missing: pvalue (1)
$ peek NA12878.bam
NA12878.bam — 61,614 reads (BAM, indexed)
────────────────────────────────────────────────────────────────────
Reference: 3366 sequences, 3.2 Gb [GRCh38 (with alts)]
Reads: 60,749 mapped (98.6%), 865 unmapped
Flags: 0.1% duplicates, 1.5% supplementary
Paired: yes (2×250 bp)
Insert size: mean 449 median 428 range 100–999 ⡀⡀⡀⡀⡆⡇⡄⡄⡄⡀⡀⡀
Read groups: 3 (NA12878, NA12878, NA12878)
Sort order: coordinate
Programs: bwamem, MarkDuplicates, GATK ApplyBQSR
MAPQ: mean 55.3 median 60 ⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡇
$ peek ERR188273_chrX_1.fq.gz
ERR188273_chrX_1.fq.gz — 30,531 reads, 2.3 Mb (FASTQ, Phred+33)
────────────────────────────────────────────────────────────────────
Read length: all 75 bp
Quality: mean Q36.7 median Q38 range Q2–Q41 ⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡆⡇
GC content: 48.9%
$ peek clinvar.vcf.gz
clinvar.vcf.gz — 4,403,650 variants (VCF)
────────────────────────────────────────────────────────────────────
Variants: 4,103,565 snps, 93,659 insertions, 194,377 deletions, 12,049 complexes
Ts/Tv: 1.69
FILTER: 4,403,650 PASS
Chroms: 32 total — top: 1 (398,195), 2 (384,641), 17 (265,676)
Supported formats
Core (no extra dependencies):
| Format | Extensions |
|---|---|
| CSV/TSV | .csv, .tsv, .txt |
| BED | .bed, .narrowPeak, .broadPeak, .bedGraph |
| FASTA | .fa, .fasta |
| FASTQ | .fq, .fastq |
| VCF | .vcf, .vcf.gz |
| GTF/GFF | .gtf, .gff, .gff3 |
Optional (install what you need):
| Format | Extensions | Install |
|---|---|---|
| SAM/BAM/CRAM | .sam, .bam, .cram |
pip install peek-bio[bam] |
| Excel | .xlsx, .xls |
pip install peek-bio[excel] |
| BigWig | .bw, .bigwig |
pip install peek-bio[bigwig] |
| H5AD | .h5ad |
pip install peek-bio[h5ad] |
Or install everything: pip install peek-bio[all]
QC warnings
peek flags common issues automatically:
- Unusual GC content (outside 25-65%)
- High N content in assemblies (>20%)
- Low mean base quality in FASTQ (<Q20)
- Adapter contamination in FASTQ (>5%)
- Low mapping rate in BAM/SAM (<80%)
- Low MAPQ scores (<20 mean)
- High duplicate rate (>30%)
- Ts/Tv ratio out of range in VCF
- Low genotype rate in multi-sample VCF (<90%)
- No gene features or missing gene_id in GTF
- Single-chromosome GTF (possible subset)
- Columns with >50% missing data
- Mixed-type columns (numbers and strings mixed together)
Usage
peek FILE [FILE ...] # preview one or more files
peek --head 20 FILE # show 20 preview rows instead of 5
peek --no-color FILE # plain text output (no ANSI colors)
peek --formats # list all supported formats + install status
peek --version # print version
Compressed files (.gz) are handled transparently.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file peek_bio-0.1.0.tar.gz.
File metadata
- Download URL: peek_bio-0.1.0.tar.gz
- Upload date:
- Size: 46.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7441dcabcb946f6bde65077d390ba58c7149d712b0a36c3032d103b8b2ab3edf
|
|
| MD5 |
3f05f0c14b48f7148e0e904225ba3326
|
|
| BLAKE2b-256 |
7a4cb84923ca820298e6ea5d4e54570d4916957531c869bc201c5619641e9977
|
File details
Details for the file peek_bio-0.1.0-py3-none-any.whl.
File metadata
- Download URL: peek_bio-0.1.0-py3-none-any.whl
- Upload date:
- Size: 42.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52ef83a87c8652a2f52afa83878d0abf7b5199a6674a90dd7e0cad7816c20358
|
|
| MD5 |
1f736603c7c20a2b2aba0534d4b64f48
|
|
| BLAKE2b-256 |
110b410fbbf0ccd9e0528014661174fca7f1b331bf443cccb94aa4fa0ce42503
|