Skip to main content

No project description provided

Project description

BetaNegBinFit

A very brief manual

The cornerstones (or rather, to be more precise, parts that are supposed to be used by a user, rather than a developer) of BetaNegBinFit are model classes that do model certain distribution and do some heavy lifting. At the moment, there are 2 models available:

  • ModelMixture -- a model that models counts at a certain slice as a mixture of 2 binomial-alike distributions;
  • ModelLine -- this can be thought of as a composition of a lot of ModelMixtures (their number is equal to a number of slices), but they are linked via constraining r parameter to a linear function of slice.

Both models can use either negative-binomial or beta-negative-binomial distribution (see model argument of their __init__ methods).

Use example: *ModelMixture"

Running ModelMixture is as simple as:

from betanegbinfit import ModelMixture
m = ModelMixture(bad=2, left=4)
res = m.fit(some_slice)

Then, you can inspect parameters through examining the res variable which is a fairly self-explanotory dict.

Some_slice?

Assume that we want to get slice of refs with fix__c = 23 for BAD=3 for our chipseq-dataset, some_slice. We suggest doing it this way:

data_folder = 'Data'
data_file = os.path.join(data_folder, 'chipseq.tsv')

bad = 3
fix_c = 23
dfo = pd.read_csv(data_file, sep='\t')
dfo = dfo[dfo.BAD == bad]
refs = dfo.REF_COUNTS
alts = dfo.ALT_COUNTS
some_slice = refs[alts == c]

Use example: ModelLine

ModelLine is ran similarly, but this time we pass whole data to the fit method instead of a single slice:

from betanegbinfit import ModelLine
m = ModelLine(bad=2, left=4)
res = m.fit(data)

We advise that data is a n x 2 numpy array rather than pandas DataFrame (where the 1st column stands for reference allele counts and the 2nd for alt counts), however if that is not the case, ModelLine will try to guess ref count, alt count and BAD columns from the dataframe.

Statistics

stats module has a number of functions that can be of interest to a prospective user:

  1. rmsea - calculate RMSEA goodness-of-fit statistic;
  2. calc_pvalues - calculate p-value for each of snp;
  3. calc_eff_sizes - calculate "effect sizes" for each of snp;
  4. calc_adjusted_loglik - calcualte adjusted loglikelihood: adjusted loglikelihood is just a likelihood correct for its parameters geometry. It is done vis subtracting logdet of Fisher information matrix.

Automatic everything & multiprocessing

However, instead of manually creating instances of model classes and working through BetaNegBinFit methods, it might be much more preferential to run a single to-use function. The package has utils.run function that is very easy to use and also does parallelization. See test.py for a real (and a very short one) example. Most importantly, it produces tabular data that can be easily analyzed in a downstream analysis.

Also, it has plenty of arguments that can be taked advantage of to do some preprocessing which might be crucial for some datasets.

Please note, that all functions have plenty of optional arguments and they all are documented, so please consider reading through help(function of interest).

A note on performance

As far as we are concerned, BetaNegBinFit should work within a manageable amounts of time. For insance, when ModelLine with model='BetaNB' ran against chipseq.tsv dataset, it finishes in 6 minutes when ran at Ryzen 5600U. It does so under 2 minutes with model='NB'.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

betanegbinfit-1.10.2.tar.gz (53.8 kB view details)

Uploaded Source

File details

Details for the file betanegbinfit-1.10.2.tar.gz.

File metadata

  • Download URL: betanegbinfit-1.10.2.tar.gz
  • Upload date:
  • Size: 53.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.1.dev0+g94f810c.d20240510 CPython/3.12.3

File hashes

Hashes for betanegbinfit-1.10.2.tar.gz
Algorithm Hash digest
SHA256 89bc388d2cb15976ec030fb4b55e7933526424f1554ea02492e6b7e96d432751
MD5 215dba8e14a08192b9588399c08dc494
BLAKE2b-256 3f3b18dd93a84805e7c89cf249a89478bfbdf59ee1e06171ff728b21a43ae541

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page