Skip to main content

Efficient variant screening of RNA-seq and DNA-seq data using k-mer-based alignment against a reference of known variants.

Project description

varseek

pypi version Downloads license status Code Coverage

alt text

varseek is a free, open-source command-line tool and Python package that provides variant screening of RNA-seq and DNA-seq data using k-mer-based alignment against a reference of known variants. The name comes from "seeking variants" or, alternatively, "seeing k-variants" (where a "k-variant" is defined as a k-mer containing a variant).

alt text

The two commands used in a standard workflow are varseek ref and varseek count. varseek ref generates a variant-containing reference sequence (VCRS) index that serves as the basis for variant calling. varseek count pseudoaligns RNA-seq or DNA-seq reads against the VCRS index and generates a variant count matrix. The variant count matrix can be used for downstream analysis. Each step wraps around other steps within the varseek package and the kb-python package, as described below.

alt text

The functions of varseek are described in the table below.

Description Bash Python (with import varseek as vk)
Build a variant-containing reference sequence (VCRS) fasta file vk build ... vk.build(...)
Describe the VCRS reference in a dataframe for filtering vk info ... vk.info(...)
Filter the VCRS file based on the CSV generated from varseek info vk filter ... vk.filter(...)
Preprocess the FASTQ files before pseudoalignment vk fastqpp ... vk.fastqpp(...)
Process the variant count matrix vk clean ... vk.clean(...)
Analyze the variant count matrix results vk summarize ... vk.summarize(...)
Wrap vk build, vk info, vk filter, and kb ref vk ref ... vk.ref(...)
Wrap vk fastqpp, kb count, vk clean, and vk summarize vk count ... vk.count(...)
Create synthetic RNA-seq dataset with variant-containing reads vk sim ... vk.sim(...)

After aligning and generating a variant count matrix with varseek, you can explore the data using our pre-built notebooks. The notebooks are described in the table below.

Description Notebook
Preprocessing the variant count matrix 3_matrix_preprocessing.ipynb
Sequence visualization of variants 4_1_variant_analysis_sequence_visualization.ipynb
Heatmap visualization of variant patterns 4_2_variant_analysis_heatmaps.ipynb
Protein-level variant analysis 4_3_variant_analysis_protein_variant.ipynb
Heatmap analysis of gene expression 5_1_gene_analysis_heatmaps.ipynb
Drug-target analysis for genes 5_2_gene_analysis_drugs.ipynb
Pathway analysis using Enrichr 6_1_pathway_analysis_enrichr.ipynb
Gene Ontology enrichment analysis (GOEA) 6_2_pathway_analysis_goea.ipynb

You can find more examples of how to use varseek in the GitHub repository for our preprint GitHub - pachterlab/RLSRWP_2025.

If you use varseek in a publication, please cite the following study:

PAPER CITATION

Read the article here: PAPER DOI

Installation

pip install varseek

🪄 Quick start guide

1. Acquire a Reference

Follow one of the below options:

a. Download a Pre-built Reference

  • (optional) View all downloadable references: vk ref --list_downloadable_references
  • vk ref --download --variants VARIANTS --sequences SEQUENCES

b. Make custom reference – screen for user-defined variants

  • vk ref --variants VARIANTS --sequences SEQUENCES ...

c. Customize reference building process – customize the VCRS filtering process (e.g., add additional information by which to filter, add custom filtering logic, tune filtering parameters based on the results of intermediate steps, etc.)

  • vk build --variants VARIANTS --sequences SEQUENCES ...
  • (optional) vk info --input_dir INPUT_DIR ...
  • (optional) vk filter --input_dir INPUT_DIR ...
  • kb ref --workflow custom --index INDEX ...

2. Screen for variants

Follow one of the below options:

a. Standard workflow

  • (optional) fastq quality control
  • vk count --index INDEX --t2g T2G ... --fastqs FASTQ1 FASTQ2...

b. Customize variant screening process - additional fastq preprocessing, custom count matrix processing

  • (optional) fastq quality control
  • (optional) vk fastqpp ... --fastqs FASTQ1 FASTQ2...
  • kb count --index INDEX --t2g T2G ... --fastqs FASTQ1 FASTQ2...
  • (optional) kb count --index REFERENCE_INDEX --t2g REFERENCE_T2G ... --fastqs FASTQ1 FASTQ2...
  • (optional) vk clean --adata ADATA ...
  • (optional) vk summarize --adata ADATA ...

Examples for getting started: GitHub - pachterlab/varseek Manuscript: ... Repository for manuscript figures: GitHub - pachterlab/RLSRP_2025

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

varseek-0.1.1.tar.gz (237.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

varseek-0.1.1-py3-none-any.whl (234.3 kB view details)

Uploaded Python 3

File details

Details for the file varseek-0.1.1.tar.gz.

File metadata

  • Download URL: varseek-0.1.1.tar.gz
  • Upload date:
  • Size: 237.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for varseek-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d6765f9fda272be7d1855106d5a52acf43500795b0002480d2b0f7600dbf011a
MD5 79bbfa0214253557c805b2299e2aa686
BLAKE2b-256 7d518aacda62660f88b7283865a76ac330574d64a2f83d6aeab3bc62d858248f

See more details on using hashes here.

File details

Details for the file varseek-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: varseek-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 234.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for varseek-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 85be51268e0fd2359d8adb9431f31dac41e47c83e44b1e9e10212820469c02d8
MD5 9aff5806f4f96e4b36be66c401e7c3a1
BLAKE2b-256 963fda5133a35fc41ab16a7ae9e8f7dd6c4606cfdd71864a618e30a4cd050ec8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page