Identification of allele-specific events in sequencing experiments.

These details have not been verified by PyPI

Project links

Homepage

Project description

MixALime: Mixture models for Allelic Imbalance Estimation

If you use Python 3.10+, the datatable package will be installed from git instead of pip. It might fail in some conda environments due to the outdated versions of libstdcxx-ng: make sure you have the latest version by running "conda install -c conda-forge libstdcxx-ng" beforehand.

MixALime is a tool for the identification of allele-specific events in high-throughput sequencing data. It works by modelling counts data as a mixture of two Negative Binomial or Beta Negative Binomial distributions (where the latter is more applicable in case of noisy data at a cost of loss of sensitivity).

The package is almost easy to use and we advise everyone to just jump straight to installing MixALime and invoking the help command in a command line:

> pip3 install mixalime
> mixalime --help

We believe that the help section of MixALime covers its functionality well enough. Furthermore, the package arrives with a small demo dataset included and an easy-to-follow instruction in the abovementioned help section. Furthermore, note that all commands avaliable in MixALime's command-line interface have their own help page too, e.g.:

> mixalime fit --help

So do not waste your time looking for how-to-clues or tutorials here, just use --help.

Yet, for the sake of following the social norms that impose a requirement of README files to be useful, in the next section you'll find the excerpt from --help command as well as some other possibly useful details:

Demo

A typical MixALime session consists of sequential runs of create, fit, test, combine and, finally, export all, plot commands. For instance, we provide a demo dataset that consists of a bunch of BED-like files with allele counts at SNVs (just for the record, MixALime can work with most vcf and BED-like file formats):

> mixalime export demo

A scorefiles folder should appear now in a working directory with a plenty of BED-like files. First, we'd like to parse those files into a MixALime-friendly and efficient data structures for further usage, as well as perform some
basic filtering if necessary:

> mixalime create myprojectname scorefiles

Then we fit model parameters to the data with Negative Binomial distribution:

> mixalime fit myprojectname NB

Next we obtain raw p-values:

> mixalime test myprojectname

Usually we'd want to combine p-values across samples and apply a FDR correction:

> mixalime combine myprojectname

Finally, we obtain fancy plots fir diagnostic purposes and easy-to-work-with tabular data:

> mixalime export all myprojectname results_folder
> mixalime plot myprojectname results_folder

You'll find everything of interest in results_folder.

Combination of p-values across groups

Note: a popular synonym for "combination" in this context is aggregation.

One important feature that is not covered by the glorified --help in a very obvious fashion is a combination of p-values across separate groups (e.g. one group can be a treatment and the other is a control). The combine command with default options combines all the p-values. This can be changed by supplying the --group option followed by either a list of filenames that make up that group or a file that contains a list (newline-separated) of those files (the most convenient approach, probably), e.g.:

> mixalime combine --subname treatment -g vcfs/file1.vcf.gz -g vfcfs/file2.vfc.gz -g vcfs/file3.vcf.gz myproject
> mixalime combine --subname control -g vcfs/file4.vcf.gz -g vfcfs/file5.vfc.gz -g vcfs/file6.vcf.gz myproject

> mixalime combine --subname treatment -g group_treatment.tsv combine myproject
> mixalime combine --subname control -g group_control.tsv combine myproject

The --subname option is necessary if you wish to avoid different combine invocations overwriting each other.

Scoring models

The package provides a variety of models for datasets of varying dispersion:

Name	Dataset variance	Comments
NB	Low	Fastest parameter estimation; might be too liberal for some datasets
MCNB	Medium-low	Marginalized Compound Negative Binomial (MCNB), the safest compromise between liberal NB and conservative BetaNB
BetaNB	High	Introduces an extra parameter to control for higher variance, fits most datasets perfectly, yet the scoring is often overly conservative
Regularized BetaNB	Depends	Introduces penalty on the extra parameter to make the model less likely to overfit with the `--regul-a` command. Requires tuning the regularization hyperparameter alpha which might not be feasible

The name of the appropriate model is supplied to the fit command as an argument (except for regularized BetaNB which is just an fit ProjectName BetaNB with an --regul-a alpha_value option where alpha_value is your hyperparameter value, e.g. 1.0).

Binomial and beta-binomial models

MixALime also can do good old-fashion binomial and beta-binomial tests. They can be done with the separate test_binom (with --beta flag if you want beta-binomial). Note, that with this command you can skip the fit (as not fit is done here, except for beta-binomial, where a single variance parameter is estimated for each BAD) and test step.

Inner clockworks & Citing

For the time being, you can cite our technical arXiv paper that explains MixALime's inner clockworks in a great detail:

@misc{meshcheryakov2023mixalime,
    doi={10.48550/arXiv.2306.08287},
    title={MIXALIME: MIXture models for ALlelic IMbalance Estimation in high-throughput sequencing data},
    author={Georgy Meshcheryakov and Sergey Abramov and Aleksandr Boytsov and Andrey I. Buyan and Vsevolod J. Makeev and Ivan V. Kulakovskiy},
    year={2023},
    eprint={2306.08287},
    archivePrefix={arXiv},
    primaryClass={stat.AP}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.27.2

Sep 21, 2024

2.27.0

Sep 21, 2024

2.26.0

Sep 11, 2024

2.25.2

May 15, 2024

2.25.1

Apr 8, 2024

2.25.0

Apr 8, 2024

2.24.2

Apr 7, 2024

2.24.1

Apr 7, 2024

2.24.0

Apr 7, 2024

2.23.3

Mar 27, 2024

2.23.2

Mar 27, 2024

2.23.1

Mar 27, 2024

2.23.0

Mar 25, 2024

2.22.4

Mar 16, 2024

2.22.3

Feb 12, 2024

2.22.2

Feb 12, 2024

2.22.1

Feb 3, 2024

2.22.0

Feb 2, 2024

2.21.1

Jan 31, 2024

2.21.0

Jan 29, 2024

2.20.0

Jan 27, 2024

2.19.3

Jan 26, 2024

2.19.2

Jan 26, 2024

2.19.1

Jan 26, 2024

2.19.0

Jan 20, 2024

2.18.6

Jan 19, 2024

2.18.5

Jan 13, 2024

2.18.4

Jan 9, 2024

2.18.3

Jan 8, 2024

2.18.2

Jan 8, 2024

2.18.1

Jan 7, 2024

2.18.0

Jan 5, 2024

2.17.0

Dec 31, 2023

2.16.9

Dec 15, 2023

2.16.8

Dec 15, 2023

2.16.7

Dec 8, 2023

2.16.6

Nov 30, 2023

2.16.5

Nov 21, 2023

2.16.4

Nov 21, 2023

2.16.3

Nov 21, 2023

2.16.2

Nov 14, 2023

This version

2.16.1

Nov 13, 2023

2.16.0

Nov 13, 2023

2.15.5

Nov 13, 2023

2.15.4

Nov 13, 2023

2.15.3

Nov 13, 2023

2.15.2

Nov 8, 2023

2.15.1

Oct 25, 2023

2.15.0

Oct 19, 2023

2.14.11

Sep 30, 2023

2.14.10

Sep 12, 2023

2.14.9

Sep 12, 2023

2.14.8

Sep 12, 2023

2.14.7

May 18, 2023

2.14.6

May 8, 2023

2.14.5

May 7, 2023

2.14.4

May 7, 2023

2.14.2

May 7, 2023

2.14.1

May 7, 2023

2.14.0

May 7, 2023

2.13.0

Apr 24, 2023

2.12.10

Apr 22, 2023

2.12.9

Apr 22, 2023

2.12.8

Apr 16, 2023

2.12.6

Apr 15, 2023

2.12.5

Apr 4, 2023

2.12.4

Apr 3, 2023

2.12.3

Apr 2, 2023

2.12.2

Apr 1, 2023

2.12.1

Apr 1, 2023

2.12.0

Mar 31, 2023

2.11.1

Mar 31, 2023

2.11.0

Mar 30, 2023

2.10.0

Mar 30, 2023

2.9.9

Mar 27, 2023

2.9.8

Mar 27, 2023

2.9.7

Mar 25, 2023

2.9.6

Mar 25, 2023

2.9.5

Mar 24, 2023

2.9.4

Mar 23, 2023

2.9.3

Mar 23, 2023

2.9.2

Mar 16, 2023

2.9.1

Mar 16, 2023

2.8.1

Mar 13, 2023

2.8.0

Mar 5, 2023

2.7.0

Mar 4, 2023

2.6.1

Feb 24, 2023

2.6.0

Feb 23, 2023

2.5.2

Feb 20, 2023

2.5.1

Feb 20, 2023

2.5.0

Feb 20, 2023

2.4.10

Feb 19, 2023

2.4.9

Jan 23, 2023

2.4.8

Jan 23, 2023

2.4.7

Jan 23, 2023

2.4.6

Jan 23, 2023

2.4.5

Jan 23, 2023

2.4.4

Jan 12, 2023

2.4.3

Jan 12, 2023

2.4.2

Jan 11, 2023

2.4.1

Dec 30, 2022

2.3.0

Dec 30, 2022

2.2.16

Dec 27, 2022

2.2.15

Dec 27, 2022

2.2.14

Dec 26, 2022

2.2.13

Dec 23, 2022

2.2.12

Dec 23, 2022

2.2.11

Dec 19, 2022

2.2.10

Nov 8, 2022

2.2.9

Nov 8, 2022

2.2.8

Nov 6, 2022

2.2.7

Nov 6, 2022

2.2.6

Nov 5, 2022

2.2.5

Nov 5, 2022

2.2.4

Nov 5, 2022

2.2.3

Nov 5, 2022

2.2.2

Nov 5, 2022

2.2.1

Nov 5, 2022

2.2.0

Nov 5, 2022

2.1.10

Nov 3, 2022

2.1.9

Nov 3, 2022

2.1.8

Nov 3, 2022

2.1.7

Nov 3, 2022

2.1.6

Nov 3, 2022

2.1.5

Nov 3, 2022

2.1.4

Nov 3, 2022

2.1.3

Nov 3, 2022

2.1.2

Nov 3, 2022

2.1.1

Nov 3, 2022

2.1.0

Nov 3, 2022

2.0.12

Nov 1, 2022

2.0.11

Nov 1, 2022

2.0.10

Nov 1, 2022

2.0.9

Oct 31, 2022

2.0.8

Oct 30, 2022

2.0.7

Oct 30, 2022

2.0.6

Oct 30, 2022

2.0.5

Oct 30, 2022

2.0.4

Oct 30, 2022

2.0.3

Oct 30, 2022

2.0.2

Oct 30, 2022

2.0.1

Oct 30, 2022

2.0.0

Oct 30, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mixalime-2.16.1.tar.gz (5.5 MB view hashes)

Uploaded Nov 13, 2023 Source

Hashes for mixalime-2.16.1.tar.gz

Hashes for mixalime-2.16.1.tar.gz
Algorithm	Hash digest
SHA256	`025202947dbb62e7168432187bbfd8952726c947bce9854b6387152bc400ccd9`
MD5	`81c304c595ce695a4cea9e396dede0b1`
BLAKE2b-256	`da439bf99d2a88c46309b7a1dbebb03e6acc689e74a131d0e216ca5ca862627c`