Skip to main content

A bioinformatics solution to determine SNP flanking sequences and mutation profile

Project description

Mutation profile

This repository comprises the script(s) developed during Monkeypox 2022 outbreak to explore the mutational profiles/signatures of this virus, but that can be of broad application to other species. Currently, it comprises the script(s):

  • get_mutation_profile.py that can be used to rapidly obtain the sequence context (size defined by the user) flanking SNPs of interest and determine their mutational profile according to the user's specifications (e.g. APOBEC3-mediated viral genome editing GA>AA and TC>TT replacements)

Input/Output of get_mutation_profile.py

OPTION1
Inputs:

  1. TSV file with the columns POS REF ALT (i.e. 1-indexed reference position, reference allele and alternative allele)
  2. Fasta file including the reference genome

Output:

  1. TSV file with the mutation context and profile

OPTION 2
Inputs:

  1. TSV file with the columns ID POS REF ALT (i.e. sample ID, 1-indexed reference position, reference allele and alternative allele)
  2. Fasta file including the reference genome

Outputs:

  1. TSV file with the mutation context and profile for each sample present in the TSV input
  2. TSV file with a summary report for each position of interest including the different patterns observed and their respective frequency

NOTE: For options 1 and 2 the order of the columns in the input 1 is not important but their name is (ID, POS, REF, ALT)!!

OPTION 3
Inputs:

  1. Single-column file with a list of 1-indexed reference positions of interest
  2. Multiple Sequence Alignment (fasta) including the reference genome

Outputs:

  1. TSV file with the mutation context and profile for each sample present in the alignment
  2. TSV file with a summary report for each position of interest including the different patterns observed and their respective frequency

TIP: If you do not know your positions of interest, you can run the script alignment_processing.py of ReporTree and it will provide a list of positions of interest according to your specifications.

Dependencies and installation

To run the get_mutation_profile.py you will need:

  • biopython
  • pandas

pip installation

pip install mutation-profile
mutation-profile -h

conda installation

conda create -n mutation-profile -c vmixao mutation-profile
conda activate mutation-profile # if you created the conda environment
mutation-profile -h

Usage

  -h, --help            show this help message and exit

Mutation profile:
  Provide input/output specifications

  -f FASTA, --fasta FASTA
                        [MANDATORY] Input sequence file (fasta)
  -m MUTATION, --mutation_list MUTATION
                        [MANDATORY] Input mutation list that can be: 1)
                        single-column file with 1-based reference position
                        information (in this case the fasta file must be a
                        multiple sequence alignment of all the sequences of
                        interest); OR 2) tsv file with the columns POS, REF,
                        and ALT where POS = 1-based reference position. If you
                        want to include information for more than one sample
                        per position, add also the column 'ID' (note that the
                        order of the columns is not important but their name
                        is!)
  -r REF, --reference REF
                        [MANDATORY] Reference sequence name
  -b BEFORE, --before BEFORE
                        [OPTIONAL] Number of nucleotides to report BEFORE the
                        mutation (default = 5)
  -a AFTER, --after AFTER
                        [OPTIONAL] Number of nucleotides to report AFTER the
                        mutation (default = 5)
  -p PROFILES, --profiles PROFILES
                        [OPTIONAL] Comma-separated list of mutational profiles
                        of interest (upper-case!). Default = 'GA>AA,TC>TT'
  -o OUTPUT, --output OUTPUT
                        [OPTIONAL] Tag for output file name. Default =
                        Mutation_profile

Examples using Monkeypox 2022 outbreak data available at examples/

Option 1 (this option reflects part of the analysis performed in the publication)

Providing a TSV file with the columns POS REF ALT (i.e. 1-indexed reference position, reference allele and alternative allele) and a fasta file including the reference genome (can be the same alignment or a normal fasta sequence).

mutation-profile -f alignment_Figure1B.fasta -m positions_of_interest_POS_REF_ALT.txt -r 'MT903344.1_Monkeypox_virus_isolate_MPXVUK_P2_complete_genome' -b 10 -a 10 -o OPTION1

Output:

  1. TSV file with the mutation context and profile

Captura de ecrã 2022-06-17, às 15 17 41

Option 2

Providing a TSV file with the columns ID POS REF ALT (i.e. samples id, 1-indexed reference position, reference allele and alternative allele) and a fasta file including the reference genome (can be the same alignment or a normal fasta sequence).

mutation-profile -f alignment_Figure1B.fasta -m positions_of_interest_ID_POS_REF_ALT.txt -r 'MT903344.1_Monkeypox_virus_isolate_MPXVUK_P2_complete_genome' -b 10 -a 10 -o OPTION2

Outputs:

  1. TSV file with the mutation context and profile for each sample present in the TSV input

Captura de ecrã 2022-06-17, às 15 21 20

  1. TSV file with a summary report for each position of interest including the different patterns observed and their respective frequency

Captura de ecrã 2022-06-17, às 15 23 07

Option 3

Providing a single-column file with a list of 1-indexed reference positions of interest and a fasta Multiple Sequence Alignment including the reference genome.

mutation-profile -f alignment_Figure1B.fasta -m Monkeypox_positions_of_interest.tsv -r 'MT903344.1_Monkeypox_virus_isolate_MPXVUK_P2_complete_genome' -b 10 -a 10 -o OPTION3

Outputs:

  1. TSV file with the mutation context and profile for each sample present in the alignment

Captura de ecrã 2022-06-17, às 15 31 56

  1. TSV file with a summary report for each position of interest including the different patterns observed and their respective frequency

Captura de ecrã 2022-06-17, às 15 33 18

TIP: If you do not know your positions of interest, you can run the script alignment_processing.py of ReporTree and it will provide a list of positions of interest according to your specifications. Example:

python ReporTree/scripts/alignment_processing.py -align alignment_Figure1B.fasta -o Monkeypox --use-reference-coords -r 'MT903344.1_Monkeypox_virus_isolate_MPXVUK_P2_complete_genome' --keep-gaps --get-positions-interest

Citation

If you use this script please cite the article where it was first described:

Isidro, J., Borges, V., Pinto, M. et al. Phylogenomic characterization and signs of microevolution in the 2022 multi-country outbreak of monkeypox virus.
Nature Medicine (2022). https://doi.org/10.1038/s41591-022-01907-y

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mutation-profile-0.2.1.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mutation_profile-0.2.1-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file mutation-profile-0.2.1.tar.gz.

File metadata

  • Download URL: mutation-profile-0.2.1.tar.gz
  • Upload date:
  • Size: 21.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.12 Linux/5.15.146.1-microsoft-standard-WSL2

File hashes

Hashes for mutation-profile-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e1bfbafe4f75576ffcc4baac09d355d8c64a7cdd590ddc3fdb75c18644da0579
MD5 2660f910dad5f91ac65da5f7cfacea1f
BLAKE2b-256 c57b706a920a0bfbec7e753ad411e7ce499125ff25f0e89fc98b97df2f629ca2

See more details on using hashes here.

File details

Details for the file mutation_profile-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: mutation_profile-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.12 Linux/5.15.146.1-microsoft-standard-WSL2

File hashes

Hashes for mutation_profile-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1f1500065e1500101636a7118877478f0b861e99d3ab0b63afa3ca970ddb9af6
MD5 14fd4c2120f7e1679b3cab8d07557265
BLAKE2b-256 22ef37cabb7ca39381d96d778fa9897357be12f64de235c7eec02797e2df5239

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page