No project description provided

These details have not been verified by PyPI

Project description

BetaNegBinFit

A very brief manual

The cornerstones (or rather, to be more precise, parts that are supposed to be used by a user, rather than a developer) of BetaNegBinFit are model classes that do model certain distribution and do some heavy lifting. At the moment, there are 2 models available:

ModelMixture -- a model that models counts at a certain slice as a mixture of 2 binomial-alike distributions;
ModelLine -- this can be thought of as a composition of a lot of ModelMixtures (their number is equal to a number of slices), but they are linked via constraining r parameter to a linear function of slice.

Both models can use either negative-binomial or beta-negative-binomial distribution (see model argument of their __init__ methods).

Use example: *ModelMixture"

Running ModelMixture is as simple as:

from betanegbinfit import ModelMixture
m = ModelMixture(bad=2, left=4)
res = m.fit(some_slice)

Then, you can inspect parameters through examining the res variable which is a fairly self-explanotory dict.

Some_slice?

Assume that we want to get slice of refs with fix__c = 23 for BAD=3 for our chipseq-dataset, some_slice. We suggest doing it this way:

data_folder = 'Data'
data_file = os.path.join(data_folder, 'chipseq.tsv')

bad = 3
fix_c = 23
dfo = pd.read_csv(data_file, sep='\t')
dfo = dfo[dfo.BAD == bad]
refs = dfo.REF_COUNTS
alts = dfo.ALT_COUNTS
some_slice = refs[alts == c]

Use example: ModelLine

ModelLine is ran similarly, but this time we pass whole data to the fit method instead of a single slice:

from betanegbinfit import ModelLine
m = ModelLine(bad=2, left=4)
res = m.fit(data)

We advise that data is a n x 2 numpy array rather than pandas DataFrame (where the 1st column stands for reference allele counts and the 2nd for alt counts), however if that is not the case, ModelLine will try to guess ref count, alt count and BAD columns from the dataframe.

Statistics

stats module has a number of functions that can be of interest to a prospective user:

rmsea - calculate RMSEA goodness-of-fit statistic;
calc_pvalues - calculate p-value for each of snp;
calc_eff_sizes - calculate "effect sizes" for each of snp;
calc_adjusted_loglik - calcualte adjusted loglikelihood: adjusted loglikelihood is just a likelihood correct for its parameters geometry. It is done vis subtracting logdet of Fisher information matrix.

Automatic everything & multiprocessing

However, instead of manually creating instances of model classes and working through BetaNegBinFit methods, it might be much more preferential to run a single to-use function. The package has utils.run function that is very easy to use and also does parallelization. See test.py for a real (and a very short one) example. Most importantly, it produces tabular data that can be easily analyzed in a downstream analysis.

Also, it has plenty of arguments that can be taked advantage of to do some preprocessing which might be crucial for some datasets.

Please note, that all functions have plenty of optional arguments and they all are documented, so please consider reading through help(function of interest).

A note on performance

As far as we are concerned, BetaNegBinFit should work within a manageable amounts of time. For insance, when ModelLine with model='BetaNB' ran against chipseq.tsv dataset, it finishes in 6 minutes when ran at Ryzen 5600U. It does so under 2 minutes with model='NB'.

Project details

These details have not been verified by PyPI

Development Status
- 2 - Pre-Alpha
Operating System
- OS Independent
Programming Language
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

1.10.2

Jun 9, 2024

1.10.1

Apr 7, 2024

1.10.0

Apr 7, 2024

1.9.12

Mar 27, 2024

1.9.11

Mar 25, 2024

1.9.10

Mar 16, 2024

1.9.9

Dec 15, 2023

1.9.8

Dec 15, 2023

1.9.7

Nov 13, 2023

1.9.6

May 9, 2023

1.9.5

May 7, 2023

1.9.4

May 7, 2023

1.9.3

May 7, 2023

1.9.2

May 7, 2023

1.9.1

May 6, 2023

1.9.0

May 6, 2023

1.8.4

Apr 16, 2023

1.8.3

Apr 15, 2023

1.8.2

Apr 15, 2023

1.8.1

Apr 4, 2023

1.8.0

Mar 30, 2023

1.7.5

Mar 30, 2023

1.7.4

Mar 24, 2023

1.7.3

Mar 23, 2023

1.7.2

Mar 23, 2023

1.7.1

Mar 10, 2023

1.7.0

Mar 10, 2023

1.6.0

Mar 5, 2023

1.5.0

Mar 4, 2023

1.4.0

Feb 23, 2023

1.3.2

Feb 20, 2023

1.3.1

Feb 20, 2023

1.3.0

Feb 20, 2023

1.2.2

Jan 14, 2023

1.2.1

Dec 30, 2022

1.2.0

Dec 30, 2022

1.1.0

Dec 26, 2022

1.0.4

Nov 5, 2022

1.0.3

Nov 3, 2022

1.0.2

Nov 3, 2022

1.0.1

Nov 1, 2022

1.0.0

Oct 30, 2022

0.74

Aug 8, 2022

0.73

Jun 22, 2022

0.72

Jun 22, 2022

0.71

May 1, 2022

0.70

Apr 30, 2022

0.69

Apr 30, 2022

0.68

Apr 30, 2022

0.67

Apr 30, 2022

0.66

Apr 25, 2022

0.65

Apr 23, 2022

0.64

Apr 23, 2022

0.63

Apr 22, 2022

0.62

Apr 20, 2022

0.61

Apr 18, 2022

0.60

Apr 18, 2022

0.59

Apr 16, 2022

0.58

Apr 15, 2022

0.57

Apr 15, 2022

0.56

Apr 13, 2022

0.55

Apr 9, 2022

0.54

Apr 5, 2022

0.53

Apr 5, 2022

0.52

Apr 5, 2022

0.51

Apr 4, 2022

0.50

Mar 23, 2022

0.49

Mar 23, 2022

0.48

Mar 23, 2022

This version

0.47

Mar 23, 2022

0.46

Mar 23, 2022

0.45

Mar 19, 2022

0.44

Mar 18, 2022

0.43

Mar 18, 2022

0.42

Mar 17, 2022

0.41

Mar 17, 2022

0.40

Mar 3, 2022

0.39

Mar 2, 2022

0.38

Mar 1, 2022

0.37

Mar 1, 2022

0.36

Mar 1, 2022

0.35

Mar 1, 2022

0.34

Mar 1, 2022

0.33

Mar 1, 2022

0.32

Mar 1, 2022

0.31

Feb 27, 2022

0.30

Feb 27, 2022

0.29

Feb 27, 2022

0.28

Feb 26, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

betanegbinfit-0.47.tar.gz (28.7 kB view hashes)

Uploaded Mar 23, 2022 Source

Hashes for betanegbinfit-0.47.tar.gz

Hashes for betanegbinfit-0.47.tar.gz
Algorithm	Hash digest
SHA256	`7e23d692bdd52bd0266a4170e1dca9640a350bcd003d33d9a02dff87d1d39a9d`
MD5	`00abc0a922579ad6dd56fcd261957fd0`
BLAKE2b-256	`ba73415d3ab9543a4a73e1e6b784ab06dd943322dfac6f9156064355228eea04`