Skip to main content

Dolvin's Math and Stats Library

Project description

Dolvins

This project provides a set of functions and classes for optimization, probability, and statistical analysis, with a focus on handling multi-dimensional data, hyperplanes, and distribution analysis.


Table of Contents

Installation

Dolvins is built on the following packages:

  • psutil

  • numpy

  • pandas

  • tqdm

  • scipy

To install Dolvins automatically with all its dependencies, please run:


pip install dolvins


Usage

General Math Functions

next_power_of_two(x: int) -> int

Returns the next power of two greater than or equal to x.

Arguments:

  • x (int): The input number.

Returns:

  • int: The next power of two.

Example:


x = 5

next_power = next_power_of_two(x)

print(next_power)



>> 8


round_down_to_nearest_power_of_two(x: int) -> int

Rounds down x to the nearest power of two.

Arguments:

  • x (int): The input number.

Returns:

  • int: The nearest power of two.

Example:


x = 10

nearest_power = round_down_to_nearest_power_of_two(x)

print(nearest_power)



>> 8


gcd_of_list(numbers: list) -> int

Returns the GCD of a list of numbers.

Arguments:

  • numbers (list): A list of integers.

Returns:

  • int: The GCD of the list.

Example:


numbers = [12, 15, 21]

gcd_result = gcd_of_list(numbers)

print(gcd_result)



>> 3


Mathematical Objects

Hyperplane

A class representing a hyperplane.

Methods:

  • __init__(self, normal: np.array, coef: float)

    Initializes a Hyperplane object with a normal vector and coefficient.

    Arguments:

    • normal (np.array): The normal vector to the hyperplane.

    • coef (float): The coefficient of the hyperplane.

  • project_point(self, *point: float) -> np.array

    Projects a point onto the hyperplane.

    Arguments:

    • point (float): The vector/point to project.

    Returns:

    • np.array: The projected point.

Example:


normal = np.array([1, 1, 1])

coef = 3

hyperplane = Hyperplane(normal, coef)

projected_point = hyperplane.project_point(2, 4, 0)

print(projected_point)



>> np.array([1, 2, 0])


Probability and Random Variables Functions

sterlings_approximation(n: int) -> float

Returns an approximation of n! using Sterling's approximation.

Arguments:

  • n (int): The input number.

Returns:

  • float: The approximate factorial of n.

Example:


n = 10

approx_factorial = sterlings_approximation(n)

print(approx_factorial)



>>> 3598695.6187410373


permutate(n: int, r: int) -> int

Calculates permutations of n objects taken r at a time (using Sterling's if n is too large)

Arguments:

  • n (int): Number of objects.

  • r (int): Number you are choosing where order matters.

Returns:

  • int: n permutate r.

Example:


n = 5

r = 3

perm_result = permutate(n, r)

print(perm_result)



>> 60


combinate(n: int, r: int) -> int

Calculates combinations of n objects taken r at a time where order does not matter.

Arguments:

  • n (int): Number of objects.

  • r (int): Number you are choosing.

Returns:

  • int: n combinate r.

Example:


n = 5

r = 3

comb_result = combinate(n, r)

print(comb_result)



>> 10


discrete_distribution_prob(exp: pd.Series, obs: pd.Series) -> float

Calculates the exact probability of observing the observed distribution given the expected distribution. Note: scale does not matter (i.e., the sum of obs vs. the sum of exp does not matter as the exp is converted to a probability)

Arguments:

  • exp (pd.Series): The ground truth (expected) distribution.

  • obs (pd.Series): The observed distribution.

Returns:

  • float: The probability of observing the distribution.

Example:


exp = pd.Series([50, 50, 50])

obs = pd.Series([2, 1, 2])

prob = discrete_distribution_prob(exp, obs)

print(prob)



>>> 0.1234


generate_combinations(num_classes: int, num_obs: int) -> set

Returns a set of all possible combinations of num_classes integers that add up to num_obs.

Arguments:

  • num_classes (int): Number of classes to choose from.

  • num_obs (int): Total number the classes should sum.

Returns:

  • set: The set of all possible combinations.

Example:


num_classes = 2

num_obs = 4

combinations = generate_combinations(num_classes, num_obs)

print(combinations)



>> {(0, 4), (1, 3), (2, 2), (3, 1), (4, 0)}


generate_normal_exponent(mean: float, std_dev: float) -> Callable

Generates a function representing the exponent of a normal distribution with the specified mean and standard deviation.

Arguments:

  • mean (float): Mean (mu) of the normal distribution.

  • std_dev (float): Standard deviation (sigma) of the normal distribution.

Returns:

  • Callable: A function representing the exponent.

Example:


mean = 0

std_dev = 1

normal_exp = generate_normal_exponent(mean, std_dev)

normal_exp = the functional equivalent to $- \frac{1}{2} \cdot (\frac{x - \mu}{\sigma})^2$ where $\mu$ = mean and $\sigma$ = std_dev


generate_joint_pdf(exp: pd.Series, num_obs: int) -> Callable

Generates a joint probability density function (PDF) for all possible outcomes based on the expected distribution and the total number of observations.

Arguments:

  • exp (pd.Series): The ground truth (expected) distribution.

  • num_obs (int): The number of observations.

Returns:

  • Callable: The joint PDF function.

Explanation:

  1. Approximates each classes distribution with a Normal PDF

  2. Multiplies each classes approximation to get a Joint PDF

Example:


exp = pd.Series([4, 6])

num_obs = 100

joint_pdf = generate_joint_pdf(exp, num_obs)

joint_pdf = the functional equivalent to $\frac{1}{\sqrt(2\cdot\pi\cdot40\cdot\frac{6}{10})\sqrt(2\cdot\pi\cdot60\cdot\frac{4}{10})} \cdot e^{- \frac{1}{2} \cdot (\frac{x - 40}{\sqrt(40\cdot\frac{6}{10}})^2 - \frac{1}{2} \cdot (\frac{y - 60}{\sqrt(60\cdot\frac{4}{10}})^2}$


Calculus Functions

hyperplane_integration(f: Callable, hyperplane: list, max_val: float = None, chunk_size: int = "auto", num_samples: int = "auto", random_state: int = 42, pbar: Callable = None) -> float

Integrates the PDF over an N-d hyperplane using quasi-Monte Carlo integration (Sobol sampling) - Currently only supports integration in the positive quadrant.

Arguments:

  • f (Callable): The function to integrate.

  • hyperplane (object): The hyperplane over which to integrate.

  • max_val (float): The max value at which to cap integration (defaulted to None) - any region in which the function goes beyond that value is not counted.

  • chunk_size (int): The amount of samples to handle at one time (defaulted to auto).

  • random_state (int): Random state to use to ensure the integration is deterministic.

  • pbar (tqdm): Progress bar to update with every chunk completed (defaulted to None)

Returns:

  • float: The result of integration.

Example:


f = lambda x, y, z: x + y + z

hyperplane = Hyperplane(normal=np.array([1, 1, 1]), coef=3)

result = hyperplane_integration(f, hyperplane)

print(result)



>> 13.5


Distribution Analysis Functions

E(exp: pd.Series, obs: pd.Series, approximate: bool, chunk_size: int = "auto", num_samples: int = "auto", random_state: int = None) -> float

Performs an E-test on an expected distribution and observed distribution.

Arguments:

  • exp (pd.Series): The expected (ground-truth) distribution.

  • obs (pd.Series): The observed distribution.

  • approximate (bool): If False, the exact discrete probability is calculated; if True, an approximate is calculated based on continuous probability.

  • chunk_size (int): The amount of samples to do simultaneously (defaulted to "auto").

  • num_samples (int): The number of samples to calculate in total - lower is faster but less precise.

  • random_state (int): If specified, leads to deterministic results.

Returns:

  • float: The E-value.

Explanation:

  • The E-test seeks to generate a more interpretable and accurate probability value (p-value) for testing the statistical difference between two distributions

  • The E-test assumes the expected and observed distributions are identical, and under those assumptions, calculates an E-value which is the probability of receiving a distribution more Extreme or as Extreme than that which has been observed.

  • Thus, the lower the E-value (i.e., the lower the chances of receiving a distribution that extreme if the distributions were in fact identical), the greater the indication that the distributions are different

  • The exact E-value can be calculated using discrete probability, however, an continuous probability estimate must be calculated in cases where there are many observations

  • Note: time complexity in either case is exponential so while continuous can approximate larger observations, it may take a significant amount of time for massive samples without some method of scaling them down (to be researched)

Example:


exp = pd.Series([50, 50, 50])

obs = pd.Series([300, 300, 300])

e_value = E(exp, obs, approximate=True)

print(e_value)



>> 1.0





exp = pd.Series([50, 0, 0])

obs = pd.Series([100, 0, 0])

e_value = E(exp, obs, approximate=True)

print(e_value)



>> 0





exp = pd.Series([15, 15, 15])

obs = pd.Series([155, 145, 150])

e_value = E(exp, obs, approximate=True)

print(e_value)



>> 0.77743



License

This project is licensed under the MIT License.

This README file provides detailed documentation for each function and class, including arguments, return values, and example usage. You can adjust the details based on your specific project and needs.

Written with StackEdit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dolvins-0.0.5.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

dolvins-0.0.5-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file dolvins-0.0.5.tar.gz.

File metadata

  • Download URL: dolvins-0.0.5.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for dolvins-0.0.5.tar.gz
Algorithm Hash digest
SHA256 181b55814fc89cd09737549802f2b7375e852360dda11d15b9b850e3975f1214
MD5 a1c0b708166c2f19bc87a3c3fa957dd5
BLAKE2b-256 ec40a1487b354e93c59759568416907b5938d24ff8703f190fdce0ec753e0e8a

See more details on using hashes here.

File details

Details for the file dolvins-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: dolvins-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 10.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for dolvins-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 56b797545052c72a587bd0ed8bdadd8c4891cf2bac20bddccfa23cf5702fce2b
MD5 ecbd90746bddb4184695185585bc6161
BLAKE2b-256 914968cf40c094c0af27a3143ecf2ee5faab27ab9925fb7f36c3d81441b45528

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page