Distance statistics for two random events on a network

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Distance between Random Events

This package is mainly for symbolically and numerically calculating the arbitrary order moments, pdf, cdf and their conditional counterparts of the distance between two random events in a given graph. The position of an event in a network is encoded in a tuple (e, p), where e={u,v} (assume u < v) is the edge where the event happens and p is the relative location of the event on that edge, that is the length of the segment from u (the vertex with small index) to the location of the event divided by the length of the edge e. Since both events are random, we use (X, P) and (Y, Q) to denote both events respectively.

These formulas can be easily embedded in optimization models. If consider the pmf of X and Y as decision variables, it can be shown that all these formulas are linear functions in these variables. User can use the formula methods X_coeff and Y_coeff to retrieve the corresponding coefficients.

Installation

Use pip to install the randist package.

pip install randist

Inputs

A data file whose rows are edges of the network with extra properties. There are five columns,
- i: first vertex of the edge
- j: second vertex of the edge
- l: length of the current edge
- x: the probability of event 1 happens on the current edge
- y: the probability of event 2 happens on the current edge
The joint distribution of the relative locations of two events, . This should be provided as a Phi object,

from sympy.abc import p, q  # import symbols
import randist as rt        # import our randist package

phi_pq = 1  # the uniform distribution
phi_pq = 36 * p * (1 - p) * q * (1 - q)  # both are beta functions with parameters alpha = beta = 2

phi = rt.Phi('betapq', phi_pq=phi_pq)  # create a Phi object with a name

In current implementation, the random variables X and Y are assumed to be independent, but the formulas we developed in our paper do not have this restriction. Also, currently, we assume the joint pdf alt text are same for all pair of edges (e, f), but the formulas in our paper do not have this restriction. We may relax both restrictions in the future version.

Also notice the input joint distribution phi_pq does not have to be correct outside of the domain [0, 1]^2 the unit square in the pq-plane (all values should be zero outside this domain). Our package will verify whether the input expression integral to 1 in its domain, but will not verify the non-negative requirement.

Outputs

Several statistics about the random distance D between two events:

moments of arbitrary order
cdf (point evaluation, or plotting against the distance x)
pdf (point evaluation, or plotting against the distance x)
conditional moments of arbitrary order (point evaluation, or plotting against the relative location p given X=e)
conditional cdf (point evaluation or plotting against x given (X, P) = (e, p) )
conditional pdf (point evaluation or plotting against x given (X, P) = (e, p) )

All these statistics can be computed either symbolically or numerically. We will explain their differences later.

Main Interfaces

User can achieve most tasks with two interfaces, Formulas and data_collector. The former gives the freedom of calculating statistics individually, where the latter can collect data in batch.

Interface 1: `Formulas` Object

Example

An example of using formulas objects to compute statistics:

from sympy.abc import p, q  # import symbols
import randist as rt        # import our package

gname = 'g0'  # data file name in the folder ./data
phi_pq = 36 * p * (1-p) * q * (1 - q)
phi = rt.Phi('betapq', phi_pq=phi_pq)  # create a joint pdf with a name

fls = rt.Formulas(gname, phi)                 # create a formulas object

moment = fls.get_formula(rt.Stats.MOMENT)     # get a moment formula object
cdf = fls.get_formula(rt.Stats.CDF)           # get a cdf formula object
pdf = fls.get_formula(rt.Stats.PDF)           # get a pdf formula object
cmoment = fls.get_formula(rt.Stats.CMOMENT)    # get a conditional moment formula object
ccdf = fls.get_formula(rt.Stats.CCDF)         # get a conditional cdf formula object
cpdf = fls.get_formula(rt.Stats.CPDF)         # get a conditional pdf formula object

moment.eval(3)                        # computing the 3rd order moment
moment.eval(2) - moment.eval(1) ** 2  # compute the variance
cdf.eval(9.5)                         # evaluate the cdf at the point x = 9.5
cdf.plot(show=True)                   # save the plot in the ./results folder and show it
pdf.eval(8.1)                         # evaluate the pdf at the point x = 9.5
pdf.plot()                            # save the plot in the ./results folder without showing
cmoment.eval(1, ('1', '2'), 0.5)      # the conditional expectation given (e, p) = (('1', '2'), 0.5)
cmoment.plot(2, ('1', '2'))           # plot the conditional 2nd moment against the value of p
ccdf.eval(('2', '3'), 0.1, 3.5)       # evaluate the conditional cdf at x = 3.5 given (e, p) = (('2', '3'), 0.1)
ccdf.plot(('2', '3'), 0.1)            # plot the conditional cdf given (e, p) = (('2', '3'), 0.1)
cpdf.eval(('2', '3'), 0.1, 3.5)       # same but with conditional pdf
cpdf.plot(('2', '3'), 0.1)            # same but with conditional pdf

The `Formulas` Class

The Formulas class has the following parameters,

Formulas(gname, phi, fpath='./data/', rational=False, d_jit=False, memorize=True)

each parameter is explained below:

gname: data file name without the extension .dat
phi: a Phi object for input joint distribution
fpath: the folder where you put the input data file
rational: if True, all value are computed in the rational form (slow)
d_jit: compute the shortest path length between pair of vertices in a Just In Time fashion. Set this to True if the input graph is very large and only conditional statistics are needed.
memorize: use memorization to speedup the computation. Set this to False only if the input graph is too large so that the memories in the computer are not enough.

The `get_formula` Method

get_formula(stats, symbolic=None)

each parameter is explained below:

stats: specify which type of formulas you want, all types are in the enum type Stats.
symbolic: calculate values numerically or symbolically. The default value None means auto, so moments and conditional moments will be calculated numerically, and all the rest are calculated symbolically.

This method will return a formula object.

Comparison between Numeric and Symbolic Formulas

Symbolic formula object is slow in generating the formula, but fast in evaluating values once the formula has been generated.
Symbolic formula object has two more methods that numerical formulas do not have, formula() which shows the closed-form formula for the corresponding statistics, and save_formula() that saves the generated formula into file, so that users can load it by the function load_formulas in the future without generating the formulas from scratch again.
Symbolic formula is faster in plotting.
One drawback is that the speed of symbolic formulas are getting much more slower when the size of the graph increases.
Numeric formulas are fast in evaluating a single value. And it performs much faster than symbolic formulas in both plotting and evaluation when graph is large.

Basically, if the network is large, always use numeric formulas. Otherwise, please use the default setting, especially if you want to reuse the formulas in the future.

The `Formula` Object

The main methods of formula objects are:

eval(*params, save=True): give corresponding parameters to evaluate the value. The required parameters are
- Moments: k, the order of moment.
- CDF: x, the distance.
- PDF: x, the distance.
- Conditional Moments: k; e, the edge conditioning on; p, the relative location conditioning on.
- Conditional CDF: e, p, x.
- Conditional PDF: e, p, x.
plot(*params, step=0.01, save=True, show=False): plotting the formula. The required parameters are
- Moments: cannot plot.
- CDF: no required input.
- PDF: no required input.
- Conditional Moments: k, e. Plotting over p.
- Conditional CDF: e, p.
- Conditional PDF: e, p.
X_coeff(k_val=None, p_val=None, x_val=None): consider the formula as a function of the pmf of X, retrieve the coeffecients. Return a dictionary indexed by the edges.
Y_coeff(k_val=None, p_val=None, x_val=None): consider the formula as a function of the pmf of Y, retrieve the coeffecients. Return a dictionary indexed by the edges.

Unique methods for symbolic formula objects:

formula(self, *params): return the closed form formula. Same required parameters as the plot method.
save_formula(): save the formulas in a hidden folder under current working directory. Can use the function load_formulas() to reload these formulas next time without reading the original graph.

Interface 2: `data_collector` Function

Basically, the function data_collector is a wrapper of the Formulas class. We will demonstrate the usage with an example.

Example

from sympy.abc import p, q  # import symbols
import randist as rt        # import our randist package

gname = 'g0'                # input graph name
phi = rt.Phi('uniform', 1)  # creating a input joint distribution

ks = [1, 2, 3]              # list of orders for moments
loc1 = (('1', '2'), 0.2)
loc2 = (('1', '3'), 0.5)
loc3 = (('3', '4'), 0)
locs = [loc1, loc2, loc3]   # list of locations

mmtp = {'collect': True, 'symbolic': None, 'valst': ks}          # params for moment
cdfp = {'collect': True, 'symbolic': None}                       # params for cdf

pdfp = {'collect': True, 'symbolic': None}                       # params for pdf
cmmtp ={'collect': True, 'symbolic': None, 'valst': (ks, locs)}  # params for conditional moment
ccdfp = {'collect': True, 'symbolic': None, 'valst': locs}       # params for conditional cdf
cpdfp = {'collect': True, 'symbolic': False, 'valst': locs}      # params for conditional pdf

d_jit = False     # whether compute pairwise shortest distance in a Just In Time fashion
memorize = True   # whether use memorization to speedup the computation

# collect all specified data and save them in the folder ./results
rt.data_collector(gname, phi, mmtp, cdfp, pdfp, cmmtp, ccdfp, cpdfp, d_jit=d_jit, memorize=memorize)

Future Plan

Remove the current restrictions mentioned before about X and Y, and .
Interactive user interface.
The running speed right now is decent for common graph. Further speed improvement can be done by rewriting core functions in C.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.1.7

Jul 26, 2019

1.1.6

Jul 25, 2019

1.1.5

Jul 25, 2019

1.1.4

Jul 25, 2019

1.1.3

Jul 25, 2019

1.1.2

Jul 25, 2019

1.1.1

Jul 24, 2019

1.1.0

Jul 24, 2019

1.0.8

Feb 14, 2019

1.0.7

Feb 14, 2019

1.0.6

Feb 7, 2019

1.0.5

Feb 6, 2019

1.0.4

Feb 6, 2019

1.0.3

Feb 6, 2019

1.0.2

Feb 4, 2019

1.0.1

Feb 1, 2019

1.0.0

Feb 1, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

randist-1.1.7.tar.gz (27.3 kB view details)

Uploaded Jul 26, 2019 Source

Built Distribution

randist-1.1.7-py3-none-any.whl (30.7 kB view details)

Uploaded Jul 26, 2019 Python 3

File details

Details for the file randist-1.1.7.tar.gz.

File metadata

Download URL: randist-1.1.7.tar.gz
Upload date: Jul 26, 2019
Size: 27.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.7.2 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.3

File hashes

Hashes for randist-1.1.7.tar.gz
Algorithm	Hash digest
SHA256	`8f403ec17c0ba74568655616c1e3f7129aa3065410c6f163dd9f3ffd0e556df0`
MD5	`a08c6991bfc2f83e5263bdf69f1abb69`
BLAKE2b-256	`e5b5e9e48d2ed8cf7b527c052be1efe1e9832447d2dd4a3d6cd9047087b454e9`

See more details on using hashes here.

File details

Details for the file randist-1.1.7-py3-none-any.whl.

File metadata

Download URL: randist-1.1.7-py3-none-any.whl
Upload date: Jul 26, 2019
Size: 30.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/40.7.2 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.3

File hashes

Hashes for randist-1.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c69823c5ad5021e232253ebac3dbaed1d7815c784e585a0144ea286cd8cbc79a`
MD5	`d1912763278c4c1a5dca5e70dd13dc46`
BLAKE2b-256	`07a1cb9473dd046d9f37b231d5fbb6a13e4425c9c801dbf8c7bd3a7f0a3a4f37`

See more details on using hashes here.

randist 1.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Distance between Random Events

Installation

Inputs

Outputs

Main Interfaces

Interface 1: `Formulas` Object

Example

The `Formulas` Class

The `get_formula` Method

Comparison between Numeric and Symbolic Formulas

The `Formula` Object

Interface 2: `data_collector` Function

Example

Future Plan

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

randist 1.1.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Distance between Random Events

Installation

Inputs

Outputs

Main Interfaces

Interface 1: Formulas Object

Example

The Formulas Class

The get_formula Method

Comparison between Numeric and Symbolic Formulas

The Formula Object

Interface 2: data_collector Function

Example

Future Plan

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Interface 1: `Formulas` Object

The `Formulas` Class

The `get_formula` Method

The `Formula` Object

Interface 2: `data_collector` Function