VDJ-Insights provides a robust framework for the accurate annotation of complex genomic immune regions.

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

VDJ-Insights

Introduction

VDJ-Insights is a robust software package for accurate annotation of the V, D, and J gene segments within immunoglobulin (IG) and T-cell receptor (TCR) genomic regions. In addition to segment annotation, it evaluates gene functionality, detects recombination signal sequences (RSS), and annotates complementary-determining regions 1 and 2 (CDR1 and CDR2). These features extend the utility of VDJ-Insights beyond gene annotation, providing a powerful framework for functional immunogenetics and enabling evolutionary and comparative analyses at individual, population, and species levels.

Installation

VDJ-Insights is currently only supported on Linux systems. Before running the pipeline, please ensure that Python (version 3.7 or higher) and Conda are installed on your system. You can install VDJ-Insights using one of the following methods:

Option 1: Clone the repository

Clone the VDJ-Insights repository:

git clone https://github.com/BPRC-CGR/VDJ-insights

Navigate to the repository directory:
```
cd vdj_insights
```

Run the pipeline using Python's -m option:

python -m vdj_insights <annotation|html> [arguments]

Note: When cloning the repository, the pipeline must always be executed using the python -m option. This ensures that Python correctly recognizes the package structure and runs the pipeline without additional installation steps.

Option 2: Install via pip

Use pip to install VDJ-Insights:
```
pip install vdj_insights
```

Run the pipeline:

vdj_insights <annotation|html> [arguments]

Using VDJ-Insights

Use the following command to run the annotation script:

python vdj-insights annotation -a <assembly_directory> | -i <region_directory> -l <library_directory/library.fasta> -r <receptor_type> -s <species_name> -f <flanking_genes> -t <threads> -m <mappingtool, mapping_tool> -M <metadata_directory> -o <output_directory> --default

Required Arguments:

Argument	Description	Example
`-r`, `--receptor-type`	Type of receptor to analyze. Choices: `IG` (immunoglobulin) or `TR` (T-cell receptor). Required when using `--default`.	`-r TR`
`-i`, `--input` or `-a`, `--assembly`	Directory containing either extracted sequence regions (`--input`), referring to sequences of the region of interest already isolated from a genome assembly or complete genome assembly files (`--assembly`).	`-i /path/to/region` `-a /path/to/assembly`
`-l`, `--library`	Path to the FASTA library file containing reference V(D)J segment sequences.	`-l /path/to/library.fasta`
`-f`, `--flanking-genes`	Comma-separated list of flanking genes provided as key-value pairs in JSON format. If only one flanking gene is present, use `"-"` as a placeholder for the missing side.	`-f '{"IGH": ["PACS2", "-"], "IGK": ["RPIA", "PAX8"], "IGL": ["GANZ", "TOP3B"]}'`
`-s`, `--species`	Scientific species name (e.g., `Homo sapiens`).	`-s "Homo sapiens"`

Optional Arguments:

Argument	Description	Example
`-M`, `--metadata`	Path to the metadata file (.xlsx). Download example template	`-M metadata.xlsx`
`-o`, `--output`	Output directory for the results (Default: `annotation_results`).	`-o /path/to/output`
`-m`, `--mapping-tool`	Available mapping tools: `minimap2`, `bowtie`, `bowtie2`. (Default: all).	`-m minimap2`
`-t`, `--threads`	Number of threads for parallel processing (Default: `8`).	`-t 16`
`--default`	Use default settings (cannot be used with `--flanking-genes`).	`--default`
`-S`, `--scaffolding`	Path to reference genome (FASTA). Only supported for phased assembly files.	`-S /path/to/reference.fasta`

Important notes

If using the -i/--input flag, do not specify -f/--flanking-genes, as flanking genes are only required when defining regions of interest from a complete genome assembly using -a/--assembly.
If using the -i/--input flag, input file(s) should be named in the format <sample-name>_<region>.fasta and must be located in the indicated directory.
If using the --default flag, do not specify -f/--flanking-genes as they are mutually exclusive.
If using the --default flag, the annotation tool automatically downloads the appropriate V(D)J gene segment library based on the specified receptor type (-r) and species (-s). There is no need to define flanking genes manually or provide a local library file.
If using the --scaffolding flag, RagTag scaffolding requires a phased assembly as input. If the input assembly contains contigs of both haplotypes, it should be phased beforehand.

Example

Download the T2T-CHM13v2.0 assembly file from the T2T Consortium (GCA_009914755.4) using the following command:

wget https://ftp.ensembl.org/pub/rapid-release/species/Homo_sapiens/GCA_009914755.4/ensembl/genome/Homo_sapiens-GCA_009914755.4-unmasked.fa.gz

Extract the assembly file:

gunzip Homo_sapiens-GCA_009914755.4-unmasked.fa.gz

Run VDJ-Insights using the T2T assembly:

python -m vdj-insights annotation -a /path/to/GCA_009914755.4-unmasked.fa -r IG -s "Homo sapiens" --default

vdj-insights annotation -a /path/to/GCA_009914755.4-unmasked.fa -r IG -s "Homo sapiens" --default

When the --default flag is used, VDJ-Insights automatically downloads the appropriate V(D)J segment library for the specified receptor type (-r) and species (-s) from the IMGT, when available. It is not necessary to specify flanking genes or provide a local library file.

Annotation results

The results generated by VDJ-Insights are stored in the annotation directory. This directory includes the following Excel files:

annotation_report_known.xlsx contains information on known V, D, and J gene segments, including recombination signal sequences.
annotation_report_novel.xlsx contains information on novel V, D, and J gene segments, including recombination signal sequences.
annotation_report_all.xlsx combines information on both known and novel V, D, and J gene segments.
tmp/blast_results.xlsx contains the BLAST search results used for validation of annotations.
tmp/report.xlsx provides a summary of the overall findings from the alignment analyses.

Each annotation report (known or novel) includes the following columns, providing detailed information about the identified segments:

Column	Explanation	Example
Sample	The name of the sample.	`Sample_001`
Haplotype	The haplotype ID (maternal and paternal).	`1` or `mat`
Region	The annotated region.	`IGHV`
Segment	The gene segment type.	`V`
Start coord	The start coordinate on the annotated contig.	`12345`
End coord	The end coordinate on the annotated contig.	`12789`
Strand	Segment orientation: `+` indicates 5' to 3' direction, and `-` indicates 3' to 5' direction.	`+`
Library name	The closest reference gene segment name associated with the identified segment.	`IGHV3-23*01`
Target name	The name assigned to the novel gene segment, based on the closest reference gene, with "like" appended to indicate similarity.	`IGHV3-23-like`
Short name	The gene name, as defined by IMGT nomenclature standards.	`IGHV3*01`
Similar references	Other reference gene segments sharing the same start and end coordinates; the best match is selected based on the mutation count and the reference gene name.	`IGHV3-33*02`
Target sequence	The nucleotide sequence of the novel gene segment.	`ATGGTGCAAGC...`
Library sequence	The nucleotide sequence of the closest reference gene segment.	`ATGGTGCAAAC...`
Mismatches	The total number of mismatches observed between the novel segment and the reference sequence.	`3`
% Mismatches of total alignment	The percentage of mismatches relative to the total alignment length between the identified segment and the reference.	`1.5%`
% identity	The percentage of identical bases between the identified segment and the reference over the full alignment.	`98.5%`
BTOP	BLAST traceback string that describes the exact location of substitutions, insertions, and deletions in the alignment.	`10A5G3T`
SNPs	The number of single nucleotide polymorphisms (SNPs) relative to the reference.	`2`
Insertions	The number of insertions relative to the reference.	`1`
Deletions	The number of deletions relative to the reference.	`0`
Mapping tool	The name(s) of the mapping tool(s) used for gene segment annotation.	`Minimap2`
Function	The functional classification of the segment: "F/ORF" for functional/open reading frame, "P" for potentially functional/open reading frame, or "pseudogene" if an early stop codon is detected.	`F/ORF`
Status	Indicates whether the gene segment is classified as Known or Novel.	`Novel`
Message	A generated message for the segment if stop codons are detected at critical positions.	`The STOP-CODON at the 3' end of the V-REGION can be deleted by rearrangement`
Population	The population group associated with the sample, if metadata is provided.	`Dutch`

Web interface report

The pipeline includes an interactive web interface for visualizing and exploring the annotation results. The web-based Flask report can be generated and opened using the following command:

python -m vdj_insights.html -i /path/to/output --show

vdj_insights html -i /path/to/output --show

Citing VDJ-Insights

If VDJ-Insights contributes to your research, please cite:

Acknowledgements

VDJ-Insights was developed by the department of Comparative Genetics & Refinement of the Biomedical Primate Research Centre (BPRC) in Rijswijk, the Netherlands.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

0.1.0

Jul 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdj_insights-0.1.0.tar.gz (2.1 MB view details)

Uploaded Jul 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vdj_insights-0.1.0-py3-none-any.whl (2.2 MB view details)

Uploaded Jul 21, 2025 Python 3

File details

Details for the file vdj_insights-0.1.0.tar.gz.

File metadata

Download URL: vdj_insights-0.1.0.tar.gz
Upload date: Jul 21, 2025
Size: 2.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.5

File hashes

Hashes for vdj_insights-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`03b88303a551564b71473f6a506a4b5bbaea163637cf40d88932026bf53acbef`
MD5	`21e1cab386a84a53ec5cac7cdfe64926`
BLAKE2b-256	`0e06f02d5109ac56efd635f37cafdd94da3df110d80c85d407b7f4f7ef5cab62`

See more details on using hashes here.

File details

Details for the file vdj_insights-0.1.0-py3-none-any.whl.

File metadata

Download URL: vdj_insights-0.1.0-py3-none-any.whl
Upload date: Jul 21, 2025
Size: 2.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.5

File hashes

Hashes for vdj_insights-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e2f84e5aa17e13e4417bbb61e5b699a98b1ce4212dbd8e559b4dbb9f2472a710`
MD5	`45011f0f6b23606b21b4e05752a82f49`
BLAKE2b-256	`ac014ceebc2e66c3bee10da2d9f60cc9d1fb65f9413e997263b74dd7fa60258f`

See more details on using hashes here.

vdj-insights 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

VDJ-Insights

Introduction

Installation

Option 1: Clone the repository

Option 2: Install via pip

Using VDJ-Insights

Required Arguments:

Optional Arguments:

Important notes

Example

Annotation results

Web interface report

Citing VDJ-Insights

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes