DNA Variance to Structure
Project description
DV2S: DNA Variant to Structure
A computational tool that maps DNA sequence variations to protein structures, enabling structural interpretation of genetic variants.
Features
- DNA to Protein Mapping: Translate DNA sequences and map variants to protein structures
- Multiple Structure Input: Support for PDB and mmCIF formats
- Flexible Operation Modes:
consensus: Generate consensus sequence for structure predictionmap: Map alignment to existing protein structureskip: Skip structure processing
- Advanced Structure Prediction: Integration with ESM, Boltz-2, and AlphaFold2
- Quality Control: pLDDT score filtering for predicted structures
Installation
# install for all user
pip install dv2s
# install for current user
pip install dv2s --user
Quick Start
python3 -m dv2s -dna sequences.fasta -nvidia_key key_file -output result
Usage
# use consensus sequence of alignment to predict protein structure and analyze
python3 -m dv2s -dna dna_seq.fasta -output analysis_results
# use nvidia's API to predict protein structure via Boltz-2 model
python3 -m dv2s -dna dna_seq.fasta -output analysis_results -nvidia_key key_file -predict boltz-2
# map DNA alignment to given protein structure's sequence
python3 dv2s.py -dna sequences.fasta -pdb structure.pdb -mode map
# only preprocess and align the input sequence, skip protein strucutre prediction and analyze
python3 dv2s.py -dna sequences.fasta -mode skip
# use previous protein alignment to skip the align step
python3 dv2s.py -dna sequences.fasta -protein_aln sequence.protein.aln -nvidia_key key_file -predict esm-long
Command Line Options
Sequence Input
-dnaRequired: DNA sequences in FASTA format-protein_aln: Aligned protein sequences in FASTA format-table_id: Translation table ID (default: 1, standard genetic code)
Structure Input
-mode: Operation mode (consensus,map,skip), default:consensusconsensus: use alignment to generate consensus sequence and generate structure predictionmap: map alignment to given protein structureskip: skip structure prediction and analysis
-pdb: Protein structure in PDB format-mmcif: Protein structure in mmCIF format-predict: Structure prediction method (auto,esm,esm-long,boltz-2,alphafold2)auto: Try all methodsesm: Use ESMFold server, only suitable for protein sequence shorter than 400 aaesm-long: Use nvidia's ESMFold API, allow longer input (shorter than 1024 aa)boltz-2: Use nvidia's Boltz-2 API, allow the longest input (shorter than 4096 aa)alphafold2: Use nvidia's AlphaFold2 API, allow the longest input (shorter than 4096 aa) but slower
-nvidia_key: nvidia API key file for prediction. Text file that contains only one line for the API key
Options
-mask_low_plddt: Mask low pLDDT score residues in predicted structures. Set the residues' B-factor to 0-min_plddt: Minimum pLDDT value threshold (default: 0.3)-n_thread: Number of threads (default: -1, use all CPU cores)-output: Output directory for results-gene: Gene name, for retrieve protein structure from Uniprot-organism: Organism name (e.g., "Oryza sativa"), for Uniprot. Currently, Uniprot may return unwanted result.
Output
DV2S use input filename's prefix as output's prefix. DV2S generates comprehensive outputs including:
Summary
.csv: CSV-format result of the analysis
Sequence
.dna.fasta: A copy of input DNA sequences.protein.fasta: Translated protein sequences.dna_cons.fasta: Consensus sequence generated from the DNA alignment without gaps.protein_cons.fasta: Consensus sequence generated from the protein alignment without gaps
Alignment
.dna.aln: DNA alignment, generated from input DNA sequences and protein alignment.protein.aln: Protein alignment.clean_dna.aln: DNA alignment that exclude invalid protein-coding sequences.clean_protein.aln: Protein alignment that exclude invalid protein-coding sequences
Subalign
.helix_DNA.aln: A subset of sites in DNA alignment which correspond to protein helix structure. The secondary structure is defined by the consensus sequence's protein structure.strand_DNA.aln: A subset of sites in DNA alignment which correspond to protein strand structure.coil_DNA.aln: A subset of sites in DNA alignment which correspond to protein coil structure.helix_protein.aln: A subset of sites in protein alignment which correspond to protein helix structure.strand_protein.aln: A subset of sites in protein alignment which correspond to protein strand structure.coil_protein.aln: A subset of sites in protein alignment which correspond to protein coil structure
Protein structure
.predict.pdb: Protein structure prediction result.Consensus_ratio.mmcif: MMCIF-format protein structure file, the B-factor column contains normalized consensus ratio of the DNA alignment.DNA_entropy.mmcif: MMCIF-format protein structure file, the B-factor column contains normalized DNA entropy of the DNA alignment.DNA_Pi.mmcif: MMCIF-format protein structure file, the B-factor column contains normalized nucleotide diversity (Pi) of the DNA alignment.DNA_Pi_omega.mmcif: MMCIF-format protein structure file, the B-factor column contains normalized Pi-omega (nonsynonymously variance rate/synonymous variance rate) of the DNA alignment.InfandNaNare normalized to 0 or 5 * max value for visualization.protein_entropy.mmcif: MMCIF-format protein structure file, the B-factor column contains normalized protein entropy of the DNA alignment.protein_Pi.mmcif: MMCIF-format protein structure file, the B-factor column contains normalized nucleotide diversity (Pi) of the protein alignment
Others
.log: DV2S log file.dssp.log: DSSP log file.mafft.log: MAFFT log file
Dependencies
- Python 3.12+
- Structure prediction tools or APIs (ESM, AlphaFold2, etc.)
Citation
If you use DV2S in your research, please cite:
[Citation information to be added]
License
This project is licensed under the APL-3 License - see the LICENSE file for details.
Support
For questions and support, please open an issue on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dv2s-0.9.0-py3-none-any.whl.
File metadata
- Download URL: dv2s-0.9.0-py3-none-any.whl
- Upload date:
- Size: 34.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b4e93501a7402ef5f0e434cb6dce0e85bbcc04b9363e81a28e0ed0e019c10cd
|
|
| MD5 |
611d3afff4ac6d74d5d91f8effbe1642
|
|
| BLAKE2b-256 |
46be15930e5ddec70cc8492206d547c24e4153072a1354fc8bc2238f8c5fd624
|