Skip to main content

Antibody numbering software

Project description

RIOT - Rapid Immunoglobulin Overview Tool

Have some raw antibody sequences? Find matching germlines, perform numbering and get results in a familiar AIRR format!

RIOT supports both nucleotide and amino acid sequences as well as all major schemes: KABAT, CHOTHIA, MARTIN and IMGT.

MOTIVATION

Antibodies are a cornerstone of the immune system, playing a pivotal role in identifying and neutralizing infections caused by bacteria, viruses, and other pathogens. Understanding their structure, function, can provide insights into both the body's natural defenses and the principles behind many therapeutic interventions, including vaccines and antibody-based drugs. The analysis and annotation of antibody sequences, including the identification of variable, diversity, joining, and constant genes, as well as the delineation of framework regions and complementarity determining regions, are essential for understanding their structure and function. Currently analyzing large volumes of antibody sequences for is routine in antibody discovery, requiring fast and accurate tools. While there are existing tools designed for the annotation and numbering of antibody sequences, they often have limitations such as being restricted to either nucleotide or amino acid sequences, reliance on non-uniform germline databases, or slow execution times. Here we present Rapid Immunoglobulin Overview Tool (RIOT), a novel open source solution for antibody numbering that addresses these shortcomings. RIOT handles nucleotide and amino acid sequence processing, comes with a free germline database, and is computationally efficient. We hope the tool will facilitate rapid annotation of antibody sequencing outputs for the benefit of understanding of antibody biology and discovering novel therapeutics.

Links:

Requirements

  • Python ^3.10

Quickstart

> pip install riot-na

> riot_na -s GGGCGTTTTGGCAC...

{
    "sequence_header": "-",
    "sequence": "GGGCGTTTTGGCAC...",
    "numbering_scheme": "imgt",
    "locus": "igh",
    "stop_codon": False,
    "vj_in_frame": True,
    "v_frameshift": False,
    "j_frameshift": False,
    "productive": True,
    "rev_comp": False,
    "complete_vdj": True,
    "v_call": "IGHV1-69*01",
    "d_call": "IGHD3-3*01",
    "j_call": "IGHJ6*02",
    "c_call": "IGHM",
    "v_frame": 0,
    ...
}

Installation

Riot is distributed in prebuild binary wheels for all major platforms. Just run in your chosen virtualenv:

pip install riot-na

Usage

CLI

Usage: riot_na [OPTIONS]

Options:
  -f, --input-file PATH           Path to input FASTA file.
  -s, --sequence TEXT             Input sequence.
  -o, --output-file PATH          Path to output CSV file. If not specified,
                                  stdout is used.
  --scheme [kabat|chothia|imgt|martin]
                                  Which numbering scheme should be used: IMGT,
                                  KABAT, CHOTHIA, MARTIN. Default IMGT
  --species [human|mouse]         Which species germline sequences should be
                                  used. Default is all species.
  --input-type [nt|aa]            What kind of sequences are provided on
                                  input. Default is nucleotide sequences.
  -p, --ncpu INTEGER              Number of parallel processes to use. Default
                                  is number of physical cores.
  -e, --extend_alignment BOOLEAN  Include unaligned beginning of the query
                                  sequence in numbering.This option impacts
                                  only amino acid sequences passed with -s option.
  --help                          Show this message and exit.

Examples:

Run on single sequence and print output to stdout:

riot_na -s <sequence>

Run on single sequence and save output to csv:

riot_na -s <sequence> -o result.csv

Run on fasta file:

riot_na -f input.fasta -o results.csv

API Nucleotides

from riot_na import create_riot_nt, Organism, Scheme, RiotNumberingNT, AirrRearrangementEntryNT

riot_nt: RiotNumberingNT = create_riot_nt(allowed_species = [Organism.HOMO_SAPIENS])
airr_result: AirrRearrangementEntryNT = riot_nt.run_on_sequence(
                    header = "SRR13857054.957936",
                    query_sequence = "GAACCAAACTGACTGTCCTAGGCCAGCCCAAGTCTTCGCCATCAGTCACCCTGTTTCCACCTTCCCCTGAAGAGCTAAAAAAA",
                    scheme = Scheme.KABAT
                )

API Amino Acids

from riot_na import create_riot_aa, Organism, Scheme, RiotNumberingAA, AirrRearrangementEntryAA

riot_aa: RiotNumberingAA = create_riot_aa(allowed_species = [Organism.HOMO_SAPIENS])
airr_result: AirrRearrangementEntryAA = riot_aa.run_on_sequence(
                    header = "SRR13385915.5101835",
                    query_sequence = "QVTLKESGPVLVKPTETLTLTCTVSGFSLSNARMGVSWIRQPPGKALEWLAHIFSNDEKSYSTSLKSRLTISKDTSKSQVVLTMTNMDPGDTATYYCARRGGTIFGVVIILVRRPPL",
                    scheme = Scheme.KABAT,
                    extend_alignment = True
                )

Germline database

RIOT uses OGRDB as a primary source of germline alleles. Database version as of 22.01.2024 was used. C genes are imported from igblast FTP site fhttps://ftp.ncbi.nih.gov/blast/executables/igblast/release/database/ncbi_human_c_genes.tar

Data format

This section describes the fields of numbering result object AirrRearrangementEntry(AA). It is based on AIRR Rearrangement Schema format extended by 7 columns highlighted in the table (bold) and emptied of the unnecessary ones. There are also some differences in fields’ definitions, so the AIRR format specification should be treated only as a loose reference. Description of the original AIRR format can be found here.

Attributes:

  1. All fields in the format are required (always present)
  2. All fields are nullable, with the exception of sequence_header and sequence

AirrRearrangementNT fields definitions

Name Type Definition
sequence_header string Fasta header for given input sequence (when numbering a FASTA file) or value of sequence_header parameter (when using RiotNumberingNT API).
sequence string The query nucleotide sequence. Usually, this is the unmodified input sequence, but can be reverse complemented if needed.
sequence_aa string Translated query sequence.
numbering_scheme enum ["imgt", "kabat", "chothia", "martin"] Used numbering scheme, default is "imgt".
locus string Gene locus (chain type).
stop_codon boolean True if the aligned sequence contains a stop codon.
vj_in_frame boolean True if the V and J gene alignments are in-frame. In details: distance between v_alignement reading frame and j_alignment reading frame is divisible by 3.
v_frameshift boolean True if the V gene in the query nucleotide sequence contains a translational frameshift relative to the frame of the V gene reference sequence. In other words: sum of insertions and deletions between consecutive matches in alignment is divisible by 3.
j_frameshift boolean True if the J gene in the query nucleotide sequence contains a translational frameshift relative to the frame of the J gene reference sequence. In other words: sum of insertions and deletions between consecutive matches in alignment is divisible by 3.
productive boolean True if the V(D)J sequence is predicted to be productive. In details: stop_codon is False and vj_in_frame is True
rev_comp boolean True if the alignment is on the opposite strand (reverse complemented) with respect to the query sequence. If True, indicates the sequence field contains a reverse complemented original query sequence.
complete_vdj boolean True if the sequence alignment spans the entire V(D)J region. Meaning, sequence_alignment includes both the first V gene codon that encodes the mature polypeptide chain (i.e., after the leader sequence) and the last complete codon of the J gene (i.e., before the J-C splice site). This does not require an absence of deletions within the internal FWR and CDR regions of the alignment.
v_call string V gene with allele.
d_call string D gene with allele.
j_call string J gene with allele.
c_call string Constant region gene with allele.
v_frame enum [0, 1, 2] V frame offset from v_alignment_start.
j_frame enum [0, 1, 2] J frame offset from j_alignment_start.
sequence_alignment string Gapped alignment of query sequence spanning V-J segment aligned to germline, reverse complemented if needed.
germline_alignment string Gapped aligned germline sequence spanning the same region as the sequence_alignment field (V(D)J region). Segments between matched germlines are gapped to match query sequence length.
sequence_alignment_aa string Amino acid translation of the sequence_alignment.
germline_alignment_aa string Amino acid translation of the germline_alignment.
v_alignment_start integer Start position of the V gene alignment in sequence_alignment (1-based closed interval).
v_alignment_end integer End position of the V gene alignment in sequence_alignment (1-based closed interval).
d_alignment_start integer Start position of the D gene alignment in sequence_alignment (1-based closed interval).
d_alignment_end integer End position of the D gene alignment in sequence_alignment (1-based closed interval).
j_alignment_start integer Start position of the J gene alignment in sequence_alignment (1-based closed interval).
j_alignment_end integer End position of the J gene alignment in sequence_alignment (1-based closed interval).
c_alignment_start integer Start position of the C gene alignment in sequence_alignment (1-based closed interval).
c_alignment_end integer End position of the C gene alignment in sequence_alignment (1-based closed interval).
v_sequence_alignment string Aligned portion of query sequence assigned to the V gene.
v_sequence_alignment_aa string Amino acid translation of the v_sequence_alignment field.
v_germline_alignment string Aligned V gene germline sequence.
v_germline_alignment_aa string Aligned amino acid V gene germline sequence.
d_sequence_alignment string Aligned portion of query sequence assigned to the D gene.
d_germline_alignment string Aligned D gene germline sequence.
j_sequence_alignment string Aligned portion of query sequence assigned to the J gene.
j_sequence_alignment_aa string Amino acid translation of the j_sequence_alignment field.
j_germline_alignment string Aligned J gene germline sequence.
j_germline_alignment_aa string Aligned amino acid J gene germline sequence.
c_sequence_alignment string Aligned portion of query sequence assigned to the constant region.
c_germline_alignment string Aligned constant region germline sequence.
fwr1 string Nucleotide sequence of the aligned FWR1 region.
fwr1_aa string Amino acid translation of the fwr1 field.
cdr1 string Nucleotide sequence of the aligned CDR1 region.
cdr1_aa string Amino acid translation of the cdr1 field.
fwr2 string Nucleotide sequence of the aligned FWR2 region.
fwr2_aa string Amino acid translation of the fwr2 field.
cdr2 string Nucleotide sequence of the aligned CDR2 region.
cdr2_aa string Amino acid translation of the cdr2 field.
fwr3 string Nucleotide sequence of the aligned FWR3 region.
fwr3_aa string Amino acid translation of the fwr3 field.
cdr3 string Nucleotide sequence of the aligned CDR3 region.
cdr3_aa string Amino acid translation of the cdr3 field.
fwr4 string Nucleotide sequence of the aligned FWR4 region.
fwr4_aa string Amino acid translation of the fwr4 field.
junction string Junction region nucleotide sequence, where the junction is defined as the CDR3 plus the two flanking conserved codons.
junction_aa string Amino acid translation of the junction.
junction_length integer Number of nucleotides in the junction sequence.
junction_aa_length integer Number of amino acids in the junction sequence.
v_score number Alignment score (Smith-Waterman) for the V gene.
d_score number Alignment score (Smith-Waterman) for the D gene alignment.
j_score number Alignment score (Smith-Waterman) for the J gene alignment.
c_score number Alignment score (Smith-Waterman) for the C gene alignment.
v_cigar string CIGAR string for the V gene alignment.
d_cigar string CIGAR string for the D gene alignment.
j_cigar string CIGAR string for the J gene alignment.
c_cigar string CIGAR string for the C gene alignment.
v_support number V gene alignment E-value. Note: Every value less than 1.4e-45 will appear as 0.0 (due to single-precision floating point standard limitation)
d_support number D gene alignment E-value. Note: Every value less than 1.4e-45 will appear as 0.0 (due to single-precision floating point standard limitation)
j_support number J gene alignment E-value. Note: Every value less than 1.4e-45 will appear as 0.0 (due to single-precision floating point standard limitation)
c_support number C gene alignment E-value. Note: Every value less than 1.4e-45 will appear as 0.0 (due to single-precision floating point standard limitation)
v_identity number Fractional identity for the V gene alignment.
d_identity number Fractional identity for the D gene alignment.
j_identity number Fractional identity for the J gene alignment.
c_identity number Fractional identity for the C gene alignment.
v_sequence_start integer Start position of the V gene in the query sequence (1-based closed interval).
v_sequence_end integer End position of the V gene in the query sequence (1-based closed interval).
d_sequence_start integer Start position of the D gene in the query sequence (1-based closed interval).
d_sequence_end integer End position of the D gene in the query sequence (1-based closed interval).
j_sequence_start integer Start position of the J gene in the query sequence (1-based closed interval).
j_sequence_end integer End position of the J gene in the query sequence (1-based closed interval).
c_sequence_start integer Start position of the C gene in the query sequence (1-based closed interval).
c_sequence_end integer End position of the C gene in the query sequence (1-based closed interval).
v_germline_start integer Alignment start position in the V gene reference sequence (1-based closed interval).
v_germline_end integer Alignment end position in the V gene reference sequence (1-based closed interval).
d_germline_start integer Alignment start position in the D gene reference sequence (1-based closed interval).
d_germline_end integer Alignment end position in the D gene reference sequence (1-based closed interval).
j_germline_start integer Alignment start position in the J gene reference sequence (1-based closed interval).
j_germline_end integer Alignment end position in the J gene reference sequence (1-based closed interval).
c_germline_start integer Alignment start position in the C gene reference sequence (1-based closed interval).
c_germline_end integer Alignment end position in the C gene reference sequence (1-based closed interval).
fwr1_start integer FWR1 start position in the query sequence (1-based closed interval).
fwr1_end integer FWR1 end position in the query sequence (1-based closed interval).
cdr1_start integer CDR1 start position in the query sequence (1-based closed interval).
cdr1_end integer CDR1 end position in the query sequence (1-based closed interval).
fwr2_start integer FWR2 start position in the query sequence (1-based closed interval).
fwr2_end integer FWR2 end position in the query sequence (1-based closed interval).
cdr2_start integer CDR2 start position in the query sequence (1-based closed interval).
cdr2_end integer CDR2 end position in the query sequence (1-based closed interval).
fwr3_start integer FWR3 start position in the query sequence (1-based closed interval).
fwr3_end integer FWR3 end position in the query sequence (1-based closed interval).
cdr3_start integer CDR3 start position in the query sequence (1-based closed interval).
cdr3_end integer CDR3 end position in the query sequence (1-based closed interval).
fwr4_start integer FWR4 start position in the query sequence (1-based closed interval).
fwr4_end integer FWR4 end position in the query sequence (1-based closed interval).
sequence_aa_scheme_cigar string CIGAR string defining sequence_aa to scheme alignment.
scheme_residue_mapping json string Scheme numbering of sequence_alignment_aa - positions not present in this sequence are not included.
positional_scheme_mapping json string Mapping from absolute residue position in sequence_alignment_aa (0-based) to corresponding scheme position.
exc string Exception (if any) thrown during ANARCI numbering.
additional_validation_flags json string JSON string containing additional validation flags.

Additional validation flags

Following table describes additional validation flags calculated alongside main fields. Last 5 flags regarding conserved residues apply only then using IMGT schema.

Field name AIRR fields required for calculation Description
regions_in_aligned_sequence all regions (fwr1, cdr1, fwr2 …); sequence_alignment True if all region sequences, concatenated, are present in sequence_alignment.
regions_aa_in_aligned_sequence_aa all _aa (fwr1_aa, cdr1_aa, …); sequence_alignment_aa True if all region_aa sequences, concatenated, are present in sequence_alignment_aa.
translated_regions_in_aligned_sequence_aa all regions (fwr1, cdr1, fwr2 …); sequence_alignment_aa; v_frame True if all region sequences, concatenated and translated using v_frame, are present in sequence_alignment_aa.
correct_vj_in_frame v_alignment_start; v_frame; j_alignment_start; j_frame True if vj_in_frame is equal to: distance between v_alignement translation frame and j_alignment translation frame is divisible by 3.
cdr3_in_junction cdr3; junction; cdr3_aa; junction_aa True if cdr3 is present in junction and cdr3_aa is present in junction_aa.
locus_as_in_v_gene locus; v_call True if locus is consistent with the one specified in V gene (v_call).
v_gene_alignment sequence; v_sequence_start; v_sequence_end; v_sequence_alignment True if v_sequence_alignment is equal to substring in sequence from position v_sequence_start to v_sequence_end.
j_gene_alignment sequence; j_sequence_start; j_sequence_end; j_sequence_alignment True if j_sequence_alignment is equal to substring in sequence from position j_sequence_start to j_sequence_end.
c_gene_alignment sequence; c_sequence_start; c_sequence_end; c_sequence_alignment True if c_sequence_alignment is equal to substring in sequence from position c_sequence_start to c_sequence_end.
no_negative_offsets_inside_v_alignment fwr1_start; fwr1_end; cdr1_start; cdr1_end; fwr2_start; fwr2_end; cdr2_start; cdr2_end; fwr3_start; fwr3_end; cdr3_start True if there is no negative (missing) offset inside V alignment, eg.: fwr1_start == 1; fwr1_end == 35; cdr1_start == -1; cdr1_end == 65.
no_negative_offsets_inside_j_alignment cdr3_end; fwr4_start; fwr4_end True if there is no negative (missing) offset inside J alignment, eg.: cdr3_end == 293; fwr4_start == -1; fwr4_end == 326.
consecutive_offsets all _start and _end True if consecutive region_start and region_end offsets are ascendant, and no region_start is greater than corresponding region_end.
no_empty_cdr3 cdr3 True if cdr3 is present.
primary_sequence_in_sequence_alignment_aa sequence_alignment_aa; scheme_residue_mapping True if concatenation of scheme_residue_mapping amino acids results in a sequence that is a part of sequence_alignment_aa and amino acids are in correct order.
no_insertion_next_to_deletion_aa sequence_aa_scheme_cigar True if there are no insertions next to deletions - indicates correct CIGARs merging process.
insertions_in_correct_places scheme_residue_mapping; numbering_scheme; locus True if insertions are on schema-allowed positions.
correct_fwr1_offsets sequence; v_sequence_start; fwr1_start; fwr1_end; fwr1 True if fwr1 is equal to substring in sequence cut from position fwr1_start up to fwr1_end. If fwr1_start is -1 (missing), v_sequence_start is used as a starting offset instead.
correct_cdr1_offsets sequence; cdr1_start; cdr1_end; cdr1 True if cdr1 is equal to substring in sequence cut from position cdr1_start up to cdr1_end.
correct_fwr2_offsets sequence; fwr2_start; fwr2_end; fwr2 True if fwr2 is equal to substring in sequence cut from position fwr2_start up to fwr2_end.
correct_cdr2_offsets sequence; cdr2_start; cdr2_end; cdr2 True if cdr2 is equal to substring in sequence cut from position cdr2_start up to cdr2_end.
correct_fwr3_offsets sequence; fwr3_start; fwr3_end; fwr3 True if fwr3 is equal to substring in sequence cut from position fwr3_start up to fwr3_end.
correct_cdr3_offsets sequence; cdr3_start; cdr3_end; cdr3 True if cdr3 is equal to substring in sequence cut from position cdr3_start up to cdr3_end.
correct_fwr4_offsets sequence; j_sequence_end; fwr4_start; fwr4_end; fwr4 True if fwr4 is equal to substring in sequence cut from position fwr4_start up to fwr4_end. If fwr4_end is -1 (missing), j_sequence_end is used as an ending offset instead.
no_empty_fwr1_in_v v_sequence_alignment; fwr1 True if fwr1 is present.
no_empty_cdr1_in_v v_sequence_alignment; cdr1 True if cdr1 is present.
no_empty_fwr2_in_v v_sequence_alignment; fwr2 True if fwr2 is present.
no_empty_cdr2_in_v v_sequence_alignment; cdr2 True if cdr2 is present.
no_empty_fwr3_in_v v_sequence_alignment; fwr3 True if fwr3 is present.
no_empty_fwr4_in_j j_sequence_alignment; fwr4 True if fwr4 is present.
conserved_C23_present imgt_residue_mapping True if conserved Cysteine on IMGT position 23 is present.
conserved_W41_present imgt_residue_mapping True if conserved Tryptophan on IMGT position 41 is present.
conserved_C104_present imgt_residue_mapping True if conserved Cysteine on IMGT position 104 is present.
conserved_W118_heavy_present imgt_residue_mapping True if conserved Tryptophan on IMGT position 118 is present (heavy chain only).
conserved_F118_light_present imgt_residue_mapping True if conserved Phenylalanine on IMGT position 118 is present (light chain only).

AirrRearrangementAA field definitions

Airr data format was developed for nucleotide sequences. For the amino acid pipeline a similar to format was created. Most fields are analogous to nucleotide-based one, with _aa suffix in name.

Name Type Definition
sequence_header string Fasta header for given input sequence (when numbering a FASTA file) or value of sequence_header parameter (when using RiotNumberingNT API).
sequence_aa string The query sequence.
numbering_scheme enum ["imgt", "kabat", "chothia", "martin"] Used numbering scheme, default is "imgt".
locus string Gene locus (chain type).
stop_codon boolean True if the aligned sequence contains a stop codon.
productive boolean True if the V(D)J sequence is predicted to be productive. In details: stop_codon is False and V and J genes are detected.
complete_vdj boolean True if the sequence alignment spans the entire V(D)J region. Meaning, sequence alignment includes both the first V amino acid and the last of the J gene (i.e., before the J-C splice site). This does not require an absence of deletions within the internal FWR and CDR regions of the alignment.
v_call string V gene with allele.
j_call string J gene with allele.
germline_alignment_aa string Assembled, aligned, full-length inferred germline sequence spanning the same region as the sequence_alignment_aa field (V-J region).
sequence_alignment_aa string Segment of query sequence spanning V-J aligned to germline.
v_alignment_start_aa integer Start position of the V gene alignment in sequence_alignment_aa (1-based closed interval).
v_alignment_end_aa integer End position of the V gene alignment in sequence_alignment_aa (1-based closed interval).
j_alignment_start_aa integer Start position of the J gene alignment in sequence_alignment_aa (1-based closed interval).
j_alignment_end_aa integer End position of the J gene alignment in sequence_alignment_aa (1-based closed interval).
v_sequence_alignment_aa string Aligned portion of query sequence assigned to the V gene.
v_germline_alignment_aa string Aligned V gene germline sequence.
j_sequence_alignment_aa string Aligned portion of query sequence assigned to the J gene.
j_germline_alignment_aa string Aligned J gene germline sequence.
fwr1_aa string Amino acid sequence of the aligned FWR1 region.
cdr1_aa string Amino acid sequence of the aligned CDR1 region.
fwr2_aa string Amino acid sequence of the aligned FWR2 region.
cdr2_aa string Amino acid sequence of the aligned CDR2 region.
fwr3_aa string Amino acid sequence of the aligned FWR3 region.
cdr3_aa string Amino acid sequence of the aligned CDR3 region.
fwr4_aa string Amino acid sequence of the aligned FWR4 region.
junction_aa string Junction region nucleotide sequence, where the junction is defined as the CDR3 plus the two flanking conserved amino acids.
junction_aa_length integer Number of amino acids in the junction sequence.
v_score_aa number Alignment score (Smith-Waterman) for the V gene.
j_score_aa number Alignment score (Smith-Waterman) for the J gene alignment.
v_cigar_aa string CIGAR string for the V gene alignment.
j_cigar_aa string CIGAR string for the J gene alignment.
v_support_aa number V gene alignment E-value. Note: Every value less than 1.4e-45 will appear as 0.0 (due to single-precision floating point standard limitation)
j_support_aa number J gene alignment E-value. Note: Every value less than 1.4e-45 will appear as 0.0 (due to single-precision floating point standard limitation)
v_identity_aa number Fractional identity for the V gene alignment.
j_identity_aa number Fractional identity for the J gene alignment.
v_sequence_start_aa integer Start position of the V gene in the query sequence (1-based closed interval).
v_sequence_end_aa integer End position of the V gene in the query sequence (1-based closed interval).
j_sequence_start_aa integer Start position of the J gene in the query sequence (1-based closed interval).
j_sequence_end_aa integer End position of the J gene in the query sequence (1-based closed interval).
v_germline_start_aa integer Alignment start position in the V gene reference sequence (1-based closed interval).
v_germline_end_aa integer Alignment end position in the V gene reference sequence (1-based closed interval).
j_germline_start_aa integer Alignment start position in the J gene reference sequence (1-based closed interval).
j_germline_end_aa integer Alignment end position in the J gene reference sequence (1-based closed interval).
fwr1_start_aa integer FWR1 start position in the query sequence (1-based closed interval).
fwr1_end_aa integer FWR1 end position in the query sequence (1-based closed interval).
cdr1_start_aa integer CDR1 start position in the query sequence (1-based closed interval).
cdr1_end_aa integer CDR1 end position in the query sequence (1-based closed interval).
fwr2_start_aa integer FWR2 start position in the query sequence (1-based closed interval).
fwr2_end_aa integer FWR2 end position in the query sequence (1-based closed interval).
cdr2_start_aa integer CDR2 start position in the query sequence (1-based closed interval).
cdr2_end_aa integer CDR2 end position in the query sequence (1-based closed interval).
fwr3_start_aa integer FWR3 start position in the query sequence (1-based closed interval).
fwr3_end_aa integer FWR3 end position in the query sequence (1-based closed interval).
cdr3_start_aa integer CDR3 start position in the query sequence (1-based closed interval).
cdr3_end_aa integer CDR3 end position in the query sequence (1-based closed interval).
fwr4_start_aa integer FWR4 start position in the query sequence (1-based closed interval).
fwr4_end_aa integer FWR4 end position in the query sequence (1-based closed interval).
sequence_aa_scheme_cigar string CIGAR string defining sequence_alignment_aa to scheme alignment.
scheme_residue_mapping json string Scheme numbering of sequence_alignment_aa - positions not present in this sequence are not included.
positional_scheme_mapping json string Mapping from absolute residue position in sequence_alignment_aa (0-based) to corresponding scheme position.
exc string Exception (if any) thrown during ANARCI numbering.
additional_validation_flags json string JSON string containing additional validation flags.

Additional validation flags (AA)

Following table describes additional validation flags calculated alongside main fields. Last 5 flags regarding conserved residues apply only then using IMGT schema.

AIRR fields required for calculation Description
regions_aa_in_aligned_sequence_aa all _aa (fwr1_aa_aa, cdr1_aa_aa, …); sequence_alignment_aa True if all region_aa sequences, concatenated, are present in sequence_alignment_aa.
locus_as_in_v_gene locus; v_call True if locus is consistent with the one specified in V gene (v_call).
v_gene_alignment_aa sequence; v_sequence_start_aa; v_sequence_end_aa; v_sequence_alignment_aa True if v_sequence_alignment_aa is equal to substring in sequence from position v_sequence_start_aa to v_sequence_end_aa.
j_gene_alignment_aa sequence; j_sequence_start_aa; j_sequence_end_aa; j_sequence_alignment_aa True if j_sequence_alignment_aa is equal to substring in sequence from position j_sequence_start_aa to j_sequence_end_aa.
no_negative_offsets_inside_v_alignment_aa fwr1_aa_start_aa; fwr1_aa_end_aa; cdr1_aa_start_aa; cdr1_aa_end_aa; fwr2_aa_start_aa; fwr2_aa_end_aa; cdr2_aa_start_aa; cdr2_aa_end_aa; fwr3_aa_start_aa; fwr3_aa_end_aa; cdr3_aa_start_aa True if there is no negative (missing) offset inside V alignment, eg.: fwr1_aa_start_aa == 1; fwr1_aa_end_aa == 26; cdr1_aa_start_aa == -1; cdr1_aa_end_aa == 38.
no_negative_offsets_inside_j_alignment_aa cdr3_aa_end_aa; fwr4_aa_start_aa; fwr4_aa_end_aa True if there is no negative (missing) offset inside J alignment, eg.: cdr3_aa_end_aa == 117; fwr4_aa_start_aa == -1; fwr4_aa_end_aa == 128.
consecutive_offsets_aa all _start_aa and _end_aa True if consecutive region_start_aa and region_end_aa offsets are ascendant, and no region_start_aa is greater than corresponding region_end_aa.
no_empty_cdr3_aa cdr3_aa True if cdr3_aa is present.
primary_sequence_in_sequence_alignment_aa sequence_alignment_aa; scheme_residue_mapping True if concatenation of scheme_residue_mapping amino acids results in a sequence that is a part of sequence_alignment_aa and amino acids are in correct order.
no_insertion_next_to_deletion_aa sequence_aa_scheme_cigar True if there are no insertions next to deletions - indicates correct CIGARs merging process.
insertions_in_correct_places scheme_residue_mapping; numbering_scheme; locus True if insertions are on schema-allowed positions.
correct_fwr1_aa_offsets sequence; v_sequence_start_aa; fwr1_aa_start_aa; fwr1_aa_end_aa; fwr1_aa True if fwr1_aa is equal to substring in sequence cut from position fwr1_aa_start_aa up to fwr1_aa_end_aa. If fwr1_aa_start_aa is -1 (missing), v_sequence_start_aa is used as a starting offset instead.
correct_cdr1_aa_offsets sequence; cdr1_aa_start_aa; cdr1_aa_end_aa; cdr1_aa True if cdr1_aa is equal to substring in sequence cut from position cdr1_aa_start_aa up to cdr1_aa_end_aa.
correct_fwr2_aa_offsets sequence; fwr2_aa_start_aa; fwr2_aa_end_aa; fwr2_aa True if fwr2_aa is equal to substring in sequence cut from position fwr2_aa_start_aa up to fwr2_aa_end_aa.
correct_cdr2_aa_offsets sequence; cdr2_aa_start_aa; cdr2_aa_end_aa; cdr2_aa True if cdr2_aa is equal to substring in sequence cut from position cdr2_aa_start_aa up to cdr2_aa_end_aa.
correct_fwr3_aa_offsets sequence; fwr3_aa_start_aa; fwr3_aa_end_aa; fwr3_aa True if fwr3_aa is equal to substring in sequence cut from position fwr3_aa_start_aa up to fwr3_aa_end_aa.
correct_cdr3_aa_offsets sequence; cdr3_aa_start_aa; cdr3_aa_end_aa; cdr3_aa True if cdr3_aa is equal to substring in sequence cut from position cdr3_aa_start_aa up to cdr3_aa_end_aa.
correct_fwr4_aa_offsets sequence; j_sequence_end_aa; fwr4_aa_start_aa; fwr4_aa_end_aa; fwr4_aa True if fwr4_aa is equal to substring in sequence cut from position fwr4_aa_start_aa up to fwr4_aa_end_aa. If fwr4_aa_end_aa is -1 (missing), j_sequence_end_aa is used as an ending offset instead.
no_empty_fwr1_aa_in_v v_sequence_alignment_aa; fwr1_aa True if fwr1_aa is present.
no_empty_cdr1_aa_in_v v_sequence_alignment_aa; cdr1_aa True if cdr1_aa is present.
no_empty_fwr2_aa_in_v v_sequence_alignment_aa; fwr2_aa True if fwr2_aa is present.
no_empty_cdr2_aa_in_v v_sequence_alignment_aa; cdr2_aa True if cdr2_aa is present.
no_empty_fwr3_aa_in_v v_sequence_alignment_aa; fwr3_aa True if fwr3_aa is present.
no_empty_fwr4_aa_in_j j_sequence_alignment_aa; fwr4_aa True if fwr4_aa is present.
conserved_C23_present scheme_residue_mapping True if conserved Cysteine on IMGT position 23 is present.
conserved_W41_present scheme_residue_mapping True if conserved Tryptophan on IMGT position 41 is present.
conserved_C104_present scheme_residue_mapping True if conserved Cysteine on IMGT position 104 is present.
conserved_W118_heavy_present scheme_residue_mapping True if conserved Tryptophan on IMGT position 118 is present (heavy chain only).
conserved_F118_light_present scheme_residue_mapping True if conserved Phenylalanine on IMGT position 118 is present (light chain only).

Examples

Sample usage of the software is presented at https://colab.research.google.com/drive/1xKO4udsX5gmnY88eDKWsQaUnHsLuFwVA?usp=sharing. To give users the ability to use RIOT with a custom database, we provide google colab script which showcases how to build a custom germline database for RIOT. It is available at https://colab.research.google.com/drive/1VCStUKgZ1ggi2Xf5YV7hFWHxxzP29BjK?usp=sharing.

Development

RIOT uses prefiltering module written in Rust, which requires some extra steps to install from source.

# Install Poetry

curl -sSL https://install.python-poetry.org | python3 - --version 1.7.1

# Add `export PATH="/root/.local/bin:$PATH"` to your shell configuration file.****

# Download and run the Rust installation script
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y

# Restart shell to reload PATH

# Verify the installation
!poetry --version
!rustc --version
!cargo --version

git clone https://github.com/NaturalAntibody/riot_na
cd riot_na

poetry install
poetry run maturin develop -r
poetry install

Citing this work

The code and data in this package is based on the following paper <we release the paper once it clears peer review>. If you use it, please cite:

@misc{riot,
      title={RIOT - Rapid Immunoglobulin Overview Tool - rapid annotation of nucleotide and amino acid immunoglobulin sequences using an open germline database.},
      author={Paweł Dudzic, Bartosz Janusz, Tadeusz Satława, Dawid Chomicz, Tomasz Gawłowski, Rafał Grabowski, Przemysław Jóźwiak, Mateusz Tarkowski, Maciej Mycielski, Sonia Wróbel, Konrad Krawczyk*},

}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

riot_na-1.2.3.tar.gz (478.6 kB view details)

Uploaded Source

Built Distributions

riot_na-1.2.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

riot_na-1.2.3-cp312-none-win_amd64.whl (453.6 kB view details)

Uploaded CPython 3.12 Windows x86-64

riot_na-1.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

riot_na-1.2.3-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (851.1 kB view details)

Uploaded CPython 3.12 macOS 10.12+ universal2 (ARM64, x86-64) macOS 10.12+ x86-64 macOS 11.0+ ARM64

riot_na-1.2.3-cp311-none-win_amd64.whl (453.4 kB view details)

Uploaded CPython 3.11 Windows x86-64

riot_na-1.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

riot_na-1.2.3-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (849.8 kB view details)

Uploaded CPython 3.11 macOS 10.12+ universal2 (ARM64, x86-64) macOS 10.12+ x86-64 macOS 11.0+ ARM64

riot_na-1.2.3-cp310-none-win_amd64.whl (453.4 kB view details)

Uploaded CPython 3.10 Windows x86-64

riot_na-1.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

riot_na-1.2.3-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (849.7 kB view details)

Uploaded CPython 3.10 macOS 10.12+ universal2 (ARM64, x86-64) macOS 10.12+ x86-64 macOS 11.0+ ARM64

File details

Details for the file riot_na-1.2.3.tar.gz.

File metadata

  • Download URL: riot_na-1.2.3.tar.gz
  • Upload date:
  • Size: 478.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for riot_na-1.2.3.tar.gz
Algorithm Hash digest
SHA256 46884c55c9a352843dd1d903bfebc21453af84d37c1f50e5de51186cf4a69b75
MD5 828879c629b44640431fcea9d39b19b1
BLAKE2b-256 6fbc1a0bf3cb833be481ff84664eb2628e31a787811e2532fc246ea9a2cdfef5

See more details on using hashes here.

Provenance

The following attestation bundles were made for riot_na-1.2.3.tar.gz:

Publisher: master-test-build-upload.yml on NaturalAntibody/riot_na

Attestations:

File details

Details for the file riot_na-1.2.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for riot_na-1.2.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b22c5a105fdef772caaf1e9fad1c26a5a0cef20b7152666a24521df6297d2a31
MD5 f7b934b26d17ef18efcb1b2b79995997
BLAKE2b-256 451871eb53e76c3c1b76210b17b33521e26622b6ac8f6869ffdf396dd4136cf9

See more details on using hashes here.

Provenance

The following attestation bundles were made for riot_na-1.2.3-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: master-test-build-upload.yml on NaturalAntibody/riot_na

Attestations:

File details

Details for the file riot_na-1.2.3-cp312-none-win_amd64.whl.

File metadata

  • Download URL: riot_na-1.2.3-cp312-none-win_amd64.whl
  • Upload date:
  • Size: 453.6 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for riot_na-1.2.3-cp312-none-win_amd64.whl
Algorithm Hash digest
SHA256 834d67bff4d384d359db7c457fb478e7b65b541f9e2f527db3d7955203eb8460
MD5 7bcb44c311e1e5bbe3cce9a71bb0115e
BLAKE2b-256 35508691742f520a6c872eb97916cc9239919003875d591eb58b2e03897d282f

See more details on using hashes here.

Provenance

The following attestation bundles were made for riot_na-1.2.3-cp312-none-win_amd64.whl:

Publisher: master-test-build-upload.yml on NaturalAntibody/riot_na

Attestations:

File details

Details for the file riot_na-1.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for riot_na-1.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 66e6f317aa9d7af185fa429a07db5215098ec5be165f58afc02bd4c56d5643e5
MD5 71c63fc35ecc53b87b7818d1abd1565e
BLAKE2b-256 a0687209ea4774fbbc1b2ce00d6bda43de1cc163a8205a6ac2f753895d31ab14

See more details on using hashes here.

Provenance

The following attestation bundles were made for riot_na-1.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: master-test-build-upload.yml on NaturalAntibody/riot_na

Attestations:

File details

Details for the file riot_na-1.2.3-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for riot_na-1.2.3-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 05571933b23a3b4934a97db97c2ecdd4d1673e79be5bcc44de32266b1bd8e3cb
MD5 15f894556b78ad1ff43aaf917614122b
BLAKE2b-256 9fe7c50313cf0211b6e81c7e0a1504f63e45ccd900467bfa4d2ed122eb8fb640

See more details on using hashes here.

Provenance

The following attestation bundles were made for riot_na-1.2.3-cp312-cp312-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: master-test-build-upload.yml on NaturalAntibody/riot_na

Attestations:

File details

Details for the file riot_na-1.2.3-cp311-none-win_amd64.whl.

File metadata

  • Download URL: riot_na-1.2.3-cp311-none-win_amd64.whl
  • Upload date:
  • Size: 453.4 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for riot_na-1.2.3-cp311-none-win_amd64.whl
Algorithm Hash digest
SHA256 39074a58bb14fd4c31eb777dc81bcdb4dbb4238da43477ecc9693f4814a9c473
MD5 44706f2d5fc94b16347e3d3ca0ea7f43
BLAKE2b-256 4a04ea434896fa10f9717ae8ba16eac39cf66f1db234dbf39fc17ac893d8e61d

See more details on using hashes here.

Provenance

The following attestation bundles were made for riot_na-1.2.3-cp311-none-win_amd64.whl:

Publisher: master-test-build-upload.yml on NaturalAntibody/riot_na

Attestations:

File details

Details for the file riot_na-1.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for riot_na-1.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7fa0f433ce0c55aed7a0b4a8dabc77c1566484f03f2a029d6e35ca2a9107a693
MD5 f74d2e6a6031ad482697e510ca240a0b
BLAKE2b-256 2867c91a2c4c0f4caef41187c50eff6864183c375546b350c12fe62599a09e3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for riot_na-1.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: master-test-build-upload.yml on NaturalAntibody/riot_na

Attestations:

File details

Details for the file riot_na-1.2.3-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for riot_na-1.2.3-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 5b17ab445c02c5a10b4c7ee0cf79e22c6c742004b5feb07fdbc245d4d8876b50
MD5 020abb3a6df4dcbe688e1278cb102327
BLAKE2b-256 a60a17f9a53f4d5be125857c23f20d631a272385abc8cda62ac6a8afe5b77800

See more details on using hashes here.

Provenance

The following attestation bundles were made for riot_na-1.2.3-cp311-cp311-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: master-test-build-upload.yml on NaturalAntibody/riot_na

Attestations:

File details

Details for the file riot_na-1.2.3-cp310-none-win_amd64.whl.

File metadata

  • Download URL: riot_na-1.2.3-cp310-none-win_amd64.whl
  • Upload date:
  • Size: 453.4 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for riot_na-1.2.3-cp310-none-win_amd64.whl
Algorithm Hash digest
SHA256 e1c8ff3cff02cdd4f7c8433aefdde8130a14b73d66f3fb74ee7a20d55ace885e
MD5 bfbccb1e6eb7364a62eb0d6be0cfbf66
BLAKE2b-256 655b068d37db41d7a1184873f98373b520adf02f42d469f014fa98e813c600a0

See more details on using hashes here.

Provenance

The following attestation bundles were made for riot_na-1.2.3-cp310-none-win_amd64.whl:

Publisher: master-test-build-upload.yml on NaturalAntibody/riot_na

Attestations:

File details

Details for the file riot_na-1.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for riot_na-1.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3339ea452f40277bd0ef614ebe7e596e12d072ac43e6e903c764d2f88b7218ac
MD5 346827b0ae0c89bc8a3de7979e8af94f
BLAKE2b-256 897f42ad279505fedc785f9f68653e9cc083b1351355c3f37fe3f9045be45620

See more details on using hashes here.

Provenance

The following attestation bundles were made for riot_na-1.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: master-test-build-upload.yml on NaturalAntibody/riot_na

Attestations:

File details

Details for the file riot_na-1.2.3-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for riot_na-1.2.3-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 d7dff13a80be6d65d0f6a18af61b5e52b44939c9a0d38e7324c587987e3e8456
MD5 d30220c29db87a2d8204d6998f700c4e
BLAKE2b-256 5738e8206b1be919090c9a75864d82ac659de2b61c2849333eadf6487eb49d3a

See more details on using hashes here.

Provenance

The following attestation bundles were made for riot_na-1.2.3-cp310-cp310-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: master-test-build-upload.yml on NaturalAntibody/riot_na

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page