mutation_motif, software for analyses of point mutations, see https://www.ncbi.nlm.nih.gov/pubmed/27974498

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

GavinHuttley

These details have not been verified by PyPI

Project description

PyPI - Python Version

logo

Mutation Motif

mutation_motif provides capabilities for analysis of point mutation counts data. It includes commands for preparing sequence data, log-linear analyses of the resulting counts and sequence logo style visualisations. Two different analysis approaches are supported:

log-linear analysis of neighbourhood base influences on mutation coupled with a sequence logo like representation of influences (illustrated above)
log-linear analysis of mutation spectra, the relative proportions of different mutation directions from a starting base. A logo-like visualisation of the latter is also supported.

The description of the models and applications of them are described in Zhu, Neeman, Yap and Huttley 2017 Statistical methods for identifying sequence motifs affecting point mutations.

Installation

$ pip install mutation_motif

Note: In order to write Plotly figures to static image files you will need to install Chrome.

The commands

The primary tool is installed as a command line executable, mm.

Preparing data for analyses

The input sequence file format

At present, mm reads in a fasta formatted file where each sequence has identical length. The length is an odd number and where the mutation occurred at the middle base. mm assumes each sequence file contains sequences that experienced the same point mutation at this central position, e.g. seqs-CtoT.fasta contains only sequences that have experienced a C → T mutation at the central position and the sequences have a C at that position. The sequence flanking the mutated base is used to derive a paired "unmutated" reference. The details of this sampling are in Zhu et al.

Two data preparatory subcommands are available: prep-nbr and prep-spectra.

prep-nbr: converts aligned sequences to counts

prep-nbr converts a fasta formatted alignment of equal length sequences to the required counts table format.

Usage: mm prep-nbr [OPTIONS]

  Export tab delimited counts table from alignment centred on SNP position.

  Output file is written to the same path with just the file suffix changed from
  fasta to txt.

Options:
  -a, --align_path TEXT           fasta aligned file centred on mutated
                                  position.  [required]
  -o, --output_path TEXT          Path to write data.  [required]
  -f, --flank_size INTEGER        Number of bases per side to include.
                                  [required]
  --direction [AtoC|AtoG|AtoT|CtoA|CtoG|CtoT|GtoA|GtoC|GtoT|TtoA|TtoC|TtoG]
                                  Mutation direction.  [required]
  -S, --seed TEXT                 Seed for random number generator (e.g. 17, or
                                  2015-02-13). Defaults to system time.
  -R, --randomise                 Randomises the observed data, observed and
                                  reference counts distributions should match.
  --step [1|2|3]                  Specifies a "frame" for selecting the random
                                  base.  [default: 1]
  -D, --dry_run                   Do a dry run of the analysis without writing
                                  output.
  -F, --force_overwrite           Overwrite existing files.
  --help                          Show this message and exit.

prep-spectra: combining mutation counts from multiple files

This command combines the separate counts tables of prep-nbr into a larger table suitable for analyses by ll-spectra.

Usage: mm prep-spectra [OPTIONS]

  export tab delimited combined counts table by appending the 12 mutation
  direction tables, adding a new column ``direction``.

Options:
  -c, --counts_pattern TEXT  glob pattern uniquely identifying all 12 mutation
                             counts files.
  -o, --output_path TEXT     Path to write combined_counts data.
  -s, --strand_symmetric     produces table suitable for strand symmetry test.
  -p, --split_dir TEXT       path to write individual direction strand symmetric
                             tables.
  -D, --dry_run              Do a dry run of the analysis without writing
                             output.
  -F, --force_overwrite      Overwrite existing files.
  --help                     Show this message and exit.

The output counts table format

The counts table format has a simple structure, illustrated by the following:

count	pos0	pos1	pos2	pos3	mut
5663	C	T	T	T	M
2639	G	C	A	T	M
2425	G	C	A	G	M
...	...	...	...	...	...
882	G	G	G	T	R
6932	A	G	T	G	R
10550	A	A	A	A	R

The mutation status must be indicated by R (reference) and M (mutated). In this instance, the flank size is 2 and mutation was between pos1 and pos2. Tables with this format are generated by prep-nbr.

Statistical analyses of mutations

The log-linear analyses requires a counts table from the prep steps. The table contains counts for a specified flank size (maximum of 2 bases, assumed to be either side of the mutated base). It assumes the counts all reflect a specific mutation direction (e.g. AtoG) and that counts from a control distribution are also included.

Two subcommands are available: ll-nbr and ll-spectra.

ll-nbr: for detecting the influence of neighbouring bases on mutation

The first examines the influence of neighbouring bases up to fourth order interactions.

Usage: mm ll-nbr [OPTIONS]

  log-linear analysis of neighbouring base influence on point mutation

  Writes estimated statistics, figures and a run log to the specified directory
  outpath.

  See documentation for count table format requirements.

Options:
  -1, --countsfile TEXT   tab delimited file of counts.
  -o, --outpath TEXT      Directory path to write data.
  -2, --countsfile2 TEXT  second group motif counts file.
  --first_order           Consider only first order effects. Defaults to
                          considering up to 4th order interactions.
  -s, --strand_symmetry   single counts file but second group is strand.
  -g, --group_label TEXT  second group label.
  -r, --group_ref TEXT    reference group value for results presentation.
  -v, --verbose           Display more output.
  -D, --dry_run           Do a dry run of the analysis without writing output.
  --help                  Show this message and exit.

ll-spectra: detect differences in mutation spectra between groups

Contrasts the mutations from specified starting bases between groups.

Usage: mm ll-spectra [OPTIONS]

  log-linear analysis of mutation spectra between groups

Options:
  -1, --countsfile TEXT   tab delimited file of counts.
  -o, --outpath TEXT      Directory path to write data.
  -2, --countsfile2 TEXT  second group motif counts file.
  -s, --strand_symmetry   single counts file but second group is strand.
  -F, --force_overwrite   Overwrite existing files.
  -D, --dry_run           Do a dry run of the analysis without writing output.
  -v, --verbose           Display more output.
  --help                  Show this message and exit.

Visualisation of mutation motifs, or mutation spectra, in a grid is provided by the draw- subcommands.

Evaluating the effect of neighbours on mutation

Sample data files are included as tests/data/counts-CtoT.txt and tests/data/counts-CtoT-ss.txt with the latter being appropriate for analysis of the occurrence of strand asymmetric neighbour effects.

The simple analysis is invoked as:

$ mm ll-nbr -1 path/to/tests/data/counts-CtoT.txt -o path/for/results/

This will write 11 files into the results directory. Files such as 1.pdf and 2.pdf are the mutation motifs for the first and second order effects from the log-linear models. Files ending in .json contain the raw data used to produce these figures and may be used for subsequent analyses, such as generating grids of mutation motifs. The summary files include the full log-linear modelling hierarchy. The .log files track the command used to generate these files, including the input files and the settings used.

Testing for strand symmetry (or asymmetry) is done as:

$ mm ll-nbr -1 path/to/tests/data/counts-CtoT.txt -o path/for/results/ --strand_symmetry

Similar output to the above is generated. The difference here is that the reference group for display are bases on the + strand.

If comparing between groups, such as patient cohorts or chromosomal regions, then there are two separate counts files and the second count file is indicated using a -2 command line option.

Testing Full Spectra

Testing for strand symmetry requires the combined counts file, produced using the provided all_counts script. A sample such file is included as tests/data/counts-combined.txt. In this instance, a test of consistency in mutation spectra between strands is specified.

This analysis is run as:

$ mm ll-spectra -1 path/to/tests/data/counts-combined.txt -o another/path/for/results/ --strand_symmetry

Drawing

mm provides support for drawing either spectra or neighbour mutation motif logos.

Interpreting logo's

If the plot is derived from a group comparison, the relative entropy terms (which specify the stack height, letter size and orientation) are taken from the mutated class belonging to group 1 (which is the counts file path assigned to the -1 option). For example, if you specified -1 file_a.txt -2 file_b.txt, then large upright letters in the display indicate an excess in the mutated class from file_a.txt relative to file_b.txt.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

GavinHuttley

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2026.3.23

Mar 23, 2026

2025.7.17

Jul 16, 2025

2025.1.28

Jan 27, 2025

2024.9.22

Sep 22, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mutation_motif-2026.3.23.tar.gz (36.5 kB view details)

Uploaded Mar 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mutation_motif-2026.3.23-py3-none-any.whl (38.5 kB view details)

Uploaded Mar 23, 2026 Python 3

File details

Details for the file mutation_motif-2026.3.23.tar.gz.

File metadata

Download URL: mutation_motif-2026.3.23.tar.gz
Upload date: Mar 23, 2026
Size: 36.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mutation_motif-2026.3.23.tar.gz
Algorithm	Hash digest
SHA256	`65592b2baaccc941616f5501dbb17f22191bdeda08e1e2564e2c72271b4d4706`
MD5	`c1050e540bd46dc870291f53c4fff725`
BLAKE2b-256	`101ff28f21c468c761b4340f6037e5b9855de88eef869abfab8d56e9c37e9564`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mutation_motif-2026.3.23.tar.gz:

Publisher: release.yml on HuttleyLab/MutationMotif

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mutation_motif-2026.3.23.tar.gz
- Subject digest: 65592b2baaccc941616f5501dbb17f22191bdeda08e1e2564e2c72271b4d4706
- Sigstore transparency entry: 1155934378
- Sigstore integration time: Mar 23, 2026
Source repository:
- Permalink: HuttleyLab/MutationMotif@a105af7ac4b5421f0b3068c5c03d37defb9438d6
- Branch / Tag: refs/tags/2026.3.23
- Owner: https://github.com/HuttleyLab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a105af7ac4b5421f0b3068c5c03d37defb9438d6
- Trigger Event: workflow_dispatch

File details

Details for the file mutation_motif-2026.3.23-py3-none-any.whl.

File metadata

Download URL: mutation_motif-2026.3.23-py3-none-any.whl
Upload date: Mar 23, 2026
Size: 38.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mutation_motif-2026.3.23-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2383034ad7a6915db646d590cf4f4e0ef560ac66b5c8a5611cc1d1089028a430`
MD5	`4245767495d382ba522c328d3a768aab`
BLAKE2b-256	`6c95e0e0526caaa66f36c186c2d23f74cb9c5c85e520fcb04267cfbd29ed5c12`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mutation_motif-2026.3.23-py3-none-any.whl:

Publisher: release.yml on HuttleyLab/MutationMotif

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mutation_motif-2026.3.23-py3-none-any.whl
- Subject digest: 2383034ad7a6915db646d590cf4f4e0ef560ac66b5c8a5611cc1d1089028a430
- Sigstore transparency entry: 1155934392
- Sigstore integration time: Mar 23, 2026
Source repository:
- Permalink: HuttleyLab/MutationMotif@a105af7ac4b5421f0b3068c5c03d37defb9438d6
- Branch / Tag: refs/tags/2026.3.23
- Owner: https://github.com/HuttleyLab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@a105af7ac4b5421f0b3068c5c03d37defb9438d6
- Trigger Event: workflow_dispatch

mutation-motif 2026.3.23

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Mutation Motif

Installation

The commands

Preparing data for analyses

The input sequence file format

The output counts table format

Statistical analyses of mutations

Evaluating the effect of neighbours on mutation

Testing Full Spectra

Drawing

Interpreting logo's

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance