Skip to main content

Instant file previews for genomics data

Project description

peek-bio

Instant file previews for genomics data. One command, any format.

pip install peek-bio

What it does

Point peek at a file and get a structured summary: row counts, column types, quality scores, variant stats, mapping rates, QC warnings. No scripts, no notebooks, no googling command flags.

$ peek deseq2_results.csv

 deseq2_results.csv — >10,553 x 7 (CSV, comma-separated)
 ────────────────────────────────────────────────────────────────────
 Columns:
                   str    0610005C13Rik, 0610009B22Rik, ...  (1,000 unique)
   baseMean        float  3.92 … 1983.92  (median: 25.32, mean: 59.32)  ⡇⡀⡀⡀⡀⡀⡀⡀⡀⡀
   log2FoldChange  float  -3.29 … 3.60  (median: -0.02, mean: -0.04)    ⡀⡀⡀⡀⡀⡇⡄⡀⡀⡀⡀⡀
   lfcSE           float  0.11 … 1.23  (median: 0.35, mean: 0.40)       ⡄⡇⡆⡄⡄⡄⡀⡀⡀⡀⡀⡀
   stat            float  -5.94 … 8.10  (median: -0.06, mean: -0.11)    ⡀⡀⡀⡀⡇⡆⡄⡀⡀⡀⡀
   pvalue          float  5.46e-16 … 1.00  (median: 0.37, mean: 0.45)   ⡇⡄⡄⡄⡄⡄⡄⡀⡄⡄⡄⡄
   padj            float  3.42e-13 … 1.00  (median: 0.95, mean: 0.81)   ⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡇

 Missing:  pvalue (1)
$ peek NA12878.bam

 NA12878.bam — 61,614 reads (BAM, indexed)
 ────────────────────────────────────────────────────────────────────
 Reference:  3366 sequences, 3.2 Gb  [GRCh38 (with alts)]
 Reads:  60,749 mapped (98.6%), 865 unmapped
 Flags:  0.1% duplicates, 1.5% supplementary
 Paired:  yes (2×250 bp)
 Insert size:  mean 449  median 428  range 100–999  ⡀⡀⡀⡀⡆⡇⡄⡄⡄⡀⡀⡀
 Read groups:  3  (NA12878, NA12878, NA12878)
 Sort order:  coordinate
 Programs:  bwamem, MarkDuplicates, GATK ApplyBQSR
 MAPQ:  mean 55.3  median 60  ⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡇
$ peek ERR188273_chrX_1.fq.gz

 ERR188273_chrX_1.fq.gz — 30,531 reads, 2.3 Mb (FASTQ, Phred+33)
 ────────────────────────────────────────────────────────────────────
 Read length:  all 75 bp
 Quality:  mean Q36.7  median Q38  range Q2–Q41  ⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡀⡆⡇
 GC content:  48.9%
$ peek clinvar.vcf.gz

 clinvar.vcf.gz — 4,403,650 variants (VCF)
 ────────────────────────────────────────────────────────────────────
 Variants:  4,103,565 snps, 93,659 insertions, 194,377 deletions, 12,049 complexes
 Ts/Tv:  1.69
 FILTER:  4,403,650 PASS
 Chroms:  32 total — top: 1 (398,195), 2 (384,641), 17 (265,676)

Supported formats

Core (no extra dependencies):

Format Extensions
CSV/TSV .csv, .tsv, .txt
BED .bed, .narrowPeak, .broadPeak, .bedGraph
FASTA .fa, .fasta
FASTQ .fq, .fastq
VCF .vcf, .vcf.gz
GTF/GFF .gtf, .gff, .gff3

Optional (install what you need):

Format Extensions Install
SAM/BAM/CRAM .sam, .bam, .cram pip install peek-bio[bam]
Excel .xlsx, .xls pip install peek-bio[excel]
BigWig .bw, .bigwig pip install peek-bio[bigwig]
H5AD .h5ad pip install peek-bio[h5ad]

Or install everything: pip install peek-bio[all]

QC warnings

peek flags common issues automatically:

  • Unusual GC content (outside 25-65%)
  • High N content in assemblies (>20%)
  • Low mean base quality in FASTQ (<Q20)
  • Adapter contamination in FASTQ (>5%)
  • Low mapping rate in BAM/SAM (<80%)
  • Low MAPQ scores (<20 mean)
  • High duplicate rate (>30%)
  • Ts/Tv ratio out of range in VCF
  • Low genotype rate in multi-sample VCF (<90%)
  • No gene features or missing gene_id in GTF
  • Single-chromosome GTF (possible subset)
  • Columns with >50% missing data
  • Mixed-type columns (numbers and strings mixed together)

Usage

peek FILE [FILE ...]          # preview one or more files
peek --head 20 FILE           # show 20 preview rows instead of 5
peek --no-color FILE          # plain text output (no ANSI colors)
peek --formats                # list all supported formats + install status
peek --version                # print version

Compressed files (.gz) are handled transparently.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peek_bio-0.1.0.tar.gz (46.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

peek_bio-0.1.0-py3-none-any.whl (42.8 kB view details)

Uploaded Python 3

File details

Details for the file peek_bio-0.1.0.tar.gz.

File metadata

  • Download URL: peek_bio-0.1.0.tar.gz
  • Upload date:
  • Size: 46.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.5

File hashes

Hashes for peek_bio-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7441dcabcb946f6bde65077d390ba58c7149d712b0a36c3032d103b8b2ab3edf
MD5 3f05f0c14b48f7148e0e904225ba3326
BLAKE2b-256 7a4cb84923ca820298e6ea5d4e54570d4916957531c869bc201c5619641e9977

See more details on using hashes here.

File details

Details for the file peek_bio-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: peek_bio-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 42.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.5

File hashes

Hashes for peek_bio-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52ef83a87c8652a2f52afa83878d0abf7b5199a6674a90dd7e0cad7816c20358
MD5 1f736603c7c20a2b2aba0534d4b64f48
BLAKE2b-256 110b410fbbf0ccd9e0528014661174fca7f1b331bf443cccb94aa4fa0ce42503

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page