Skip to main content

Efficient variant screening of RNA-seq and DNA-seq data using k-mer-based alignment against a reference of known variants.

Project description

varseek

pypi version Downloads license status Code Coverage

alt text

varseek is a free, open-source command-line tool and Python package that provides variant screening of RNA-seq and DNA-seq data using k-mer-based alignment against a reference of known variants. The name comes from "seeking variants" or, alternatively, "seeing k-variants" (where a "k-variant" is defined as a k-mer containing a variant).

alt text

The two commands used in a standard workflow are varseek ref and varseek count. varseek ref generates a variant-containing reference sequence (VCRS) index that serves as the basis for variant calling. varseek count pseudoaligns RNA-seq or DNA-seq reads against the VCRS index and generates a variant count matrix. The variant count matrix can be used for downstream analysis. Each step wraps around other steps within the varseek package and the kb-python package, as described below.

alt text

The functions of varseek are described in the table below.

Description Bash Python (with import varseek as vk)
Build a variant-containing reference sequence (VCRS) fasta file vk build ... vk.build(...)
Describe the VCRS reference in a dataframe for filtering vk info ... vk.info(...)
Filter the VCRS file based on the CSV generated from varseek info vk filter ... vk.filter(...)
Preprocess the FASTQ files before pseudoalignment vk fastqpp ... vk.fastqpp(...)
Process the variant count matrix vk clean ... vk.clean(...)
Analyze the variant count matrix results vk summarize ... vk.summarize(...)
Wrap vk build, vk info, vk filter, and kb ref vk ref ... vk.ref(...)
Wrap vk fastqpp, kb count, vk clean, and vk summarize vk count ... vk.count(...)
Create synthetic RNA-seq dataset with variant-containing reads vk sim ... vk.sim(...)

After aligning and generating a variant count matrix with varseek, you can explore the data using our pre-built notebooks. The notebooks are described in the table below.

Description Notebook
Preprocessing the variant count matrix 3_matrix_preprocessing.ipynb
Sequence visualization of variants 4_1_variant_analysis_sequence_visualization.ipynb
Heatmap visualization of variant patterns 4_2_variant_analysis_heatmaps.ipynb
Protein-level variant analysis 4_3_variant_analysis_protein_variant.ipynb
Heatmap analysis of gene expression 5_1_gene_analysis_heatmaps.ipynb
Drug-target analysis for genes 5_2_gene_analysis_drugs.ipynb
Pathway analysis using Enrichr 6_1_pathway_analysis_enrichr.ipynb
Gene Ontology enrichment analysis (GOEA) 6_2_pathway_analysis_goea.ipynb

You can find more examples of how to use varseek in the GitHub repository for our preprint GitHub - pachterlab/RLSRWP_2025.

If you use varseek in a publication, please cite the following study:

PAPER CITATION

Read the article here: PAPER DOI

Installation

pip install varseek

🪄 Quick start guide

1. Acquire a Reference

Follow one of the below options:

a. Download a Pre-built Reference

  • (optional) View all downloadable references: vk ref --list_downloadable_references
  • vk ref --download --variants VARIANTS --sequences SEQUENCES

b. Make custom reference – screen for user-defined variants

  • vk ref --variants VARIANTS --sequences SEQUENCES ...

c. Customize reference building process – customize the VCRS filtering process (e.g., add additional information by which to filter, add custom filtering logic, tune filtering parameters based on the results of intermediate steps, etc.)

  • vk build --variants VARIANTS --sequences SEQUENCES ...
  • (optional) vk info --input_dir INPUT_DIR ...
  • (optional) vk filter --input_dir INPUT_DIR ...
  • kb ref --workflow custom --index INDEX ...

2. Screen for variants

Follow one of the below options:

a. Standard workflow

  • (optional) fastq quality control
  • vk count --index INDEX --t2g T2G ... --fastqs FASTQ1 FASTQ2...

b. Customize variant screening process - additional fastq preprocessing, custom count matrix processing

  • (optional) fastq quality control
  • (optional) vk fastqpp ... --fastqs FASTQ1 FASTQ2...
  • kb count --index INDEX --t2g T2G ... --fastqs FASTQ1 FASTQ2...
  • (optional) kb count --index REFERENCE_INDEX --t2g REFERENCE_T2G ... --fastqs FASTQ1 FASTQ2...
  • (optional) vk clean --adata ADATA ...
  • (optional) vk summarize --adata ADATA ...

Examples for getting started: GitHub - pachterlab/varseek Manuscript: ... Repository for manuscript figures: GitHub - pachterlab/RLSRP_2025

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

varseek-0.1.0.tar.gz (134.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

varseek-0.1.0-py3-none-any.whl (126.6 kB view details)

Uploaded Python 3

File details

Details for the file varseek-0.1.0.tar.gz.

File metadata

  • Download URL: varseek-0.1.0.tar.gz
  • Upload date:
  • Size: 134.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for varseek-0.1.0.tar.gz
Algorithm Hash digest
SHA256 af2a06054b7e94d847ba483cc8cbdf8e020261583ecff15a72aa7e336896593d
MD5 849c554b29be2f56f60afafa51e9d08a
BLAKE2b-256 f882def859abed623af2dcb36086fb3243831bf49f42b975d302d786ac6e8640

See more details on using hashes here.

File details

Details for the file varseek-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: varseek-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 126.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for varseek-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3bffa12f496a7518054ae5c70d9e8e2977a7688b61b1876984e582249c04b09a
MD5 2b3fbf3c7e83350a1668daa4c5dfc0f0
BLAKE2b-256 a59cc0c4e54efd5de0aabd88b0b3d7e5a4966eb4d35835cb65d8ddf2e3d399e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page