Create series of random numbers fitting a specified distribution, and then analyze them.

These details have not been verified by PyPI

Project links

Original Article

Project description

Randalyze - Random Number Generator And Analyzer

What Does Randalyze Do?

Randalyze is a Python module that you can import or use directly from the command line to generate a set of random numbers. These numbers will fit a specified distribution, rather than just being evenly distributed.

It also allows you to analyze a set of numbers, to see which distribution they match.

At the moment, the only distribution that Randalyze handles is Benford's Law. It's a counter-intuitive distribution that numbers from the real world tend to follow. So, if you want to see whether invoice totals, numbers of incidents, traffic counts, etc. are realistic, analyze them using Randalyze. Checking against Benford's Law is good enough for auditors, so there's definitely something to it!

Installation

Randalyze supports Python 3.8 and later, so you can install it on Windows 7 if you want. It doesn't support Python 2.7.

To install Randalyze, create a virtual environment and use pip:

pip install randalyze

Usage

Command Line Usage

Generating Random Numbers

The generate command creates a number of random numbers. The type of generator must be specified - currently the only option is benford, and the number (or count) of numbers generated is specified with the -c option.

For example, to generate 100 random numbers with a Benford distribution, the following command line can be used:

randalyze generate -c 100 benford

or, if you're not using an active virtual environment:

python -m randalyze generate -c 100 benford

For detailled parameter descriptions, use:

randalyze --help
randalyze generate --help

One parameter worth noting is -a, or --adjustments. Benford's Law applies when a quantity of individual numbers are combined, for example in a votes across a number of regions, an invoice, or a number of individuals infected with a disease on particular date. The value of the adjustments parameter corresponds with how many numbers are combined to produce the final result. The higher this value, the more closely the generated numbers are likely to correspond to Benford's Law, but the longer it will take to generate the numbers. A lower number of adjustments will produce faster results, but they may not adhere as closely to Benford's Law.

Output Format

Randalyzer sends all of its output to stdout, to allow the output to be piped to other processes. It can output the numbers it generates in any of the following formats:

Plain Text

This is an ideal format to pipe to other processes, including randalyze analyze. Numbers are separated by a newline character, e.g.:

0.02983916017598411
0.013551386905062924
0.00024998215041082837
0.005491804395091004
0.02466273019653459
0.0063846564027225435
0.0016043375532727268
0.0026923403623502775
0.024469531291955015
0.003059618929054221
0.046573520246102486
0.0007443706751732054
6.522095130906779e-06
0.03635226424464967
0.0013066769632407483
0.008161073155635998
0.0382949028445558
0.007312698653941686
0.0028210034630940573
0.022493548691029038

You can also use the -w / --whole-digits and -d / --fraction-digits to specify the maximum number of digit in the whole and fraction part of the numbers respectively. For example:

randalyze generate benford -c 10 -w 7 -d 2

will output a list of 10 numbers, with a maximum of 7 digits before the decimal point and 2 afterwards, e.g.:

JSON

If you're passing the data around between processes and straightforward piping is not the answer, JSON format can be produced instead:

[0.021875574255089396,0.01363138644672022,0.04880603372115697,0.00044507604221987146,0.0276700261238657,0.03214714230191736,0.02915457360778051,0.04708394499413024,0.0001520800546499388,0.0008435362137085937,0.008981908758960934,0.025213372748772594,0.2523350721739971,0.001618197247836204,0.009578408659261814,0.02682977173023309,0.043980297143841836,0.010087407530137727,0.022770430657528977,0.007422951888962673]

CSV

If you want to import the generated data into a process that requires CSV data, generate it in CSV format:

numbers,
0.046614057519653466,
0.03791869719865344,
0.0384657633546525,
2.855767617145996e-05,
0.01146753335163042,
0.03691158797807547,
0.003926152675269877,
0.05987977588577215,
0.03282925392769393,
0.03005755057310207,
2.597019093156996e-05,
0.009355992571839502,
9.107989223269995e-05,
0.0025864757492692577,
0.0008046680877702718,
0.003676954411104498,
0.0023820723499611567,
0.004461646162414915,
0.012232037104772345,
0.08673320229596392

Analyzing Numbers

If you have a set of numbers, randalyze can analyze them and tell you how close to a Benford distribution they are. These numbers may have been generated by randalyze, or maybe obtained from somewhere else - it really doesn't matter.

So, to analyze an output of 10,000 numbers from the randalyze generate command, and see whether it matches a Benford distribution to within 10%, use:

randalyze generate benford -c 10000 | randalyze analyze -t 10 benford

And to ensure the numbers in a text file numbers.txt, with one number per line, fit a Benford distribution:

cat numbers.txt | randalyze analyze -t 10 benford

or use the FILE parameter:

randalyze analyze -t 10 benford numbers.txt

And to produce the output in JSON format:

randalyze analyze -t 10 --format json benford numbers.txt

Ouput Format

Like its generated data, Randalyzer sends all of its analysis details to stdout. You can choose from a couple of formats:

Text Format

Plain text format analyses are in the following format:

Benford Analysis Result
First Digits Expected vs Actual:
1        3010  :    3139  ->    4.3%
2        1760  :    1755  ->   -0.3%
3        1249  :    1220  ->   -2.4%
4         969  :     912  ->   -5.9%
5         791  :     773  ->   -2.4%
6         669  :     650  ->   -2.9%
7         579  :     613  ->    5.7%
8         511  :     504  ->   -1.5%
9         457  :     434  ->   -5.2%

Maximum difference: 5.9%
Benford (tolerance 10.0%)? True

The expected digit count is calculated from Benford's law and rounded to the nearest integer. The actual number of occurences of the digit is in the next column, and the percentage difference between the two is in the last column.

JSON

Randalyze's analyser JSON output is in the following format:

{
    "distributions": [
        {
            "name": "benford",
            "matches": true,
            "first_digit": {
                "distribution": {
                    "0": 0.0,
                    "1": 0.30246,
                    "2": 0.17439,
                    "3": 0.12504,
                    "4": 0.09564,
                    "5": 0.07993,
                    "6": 0.06695,
                    "7": 0.05743,
                    "8": 0.05178,
                    "9": 0.04638
                },
                "counts": {
                    "0": 0,
                    "1": 30246,
                    "2": 17439,
                    "3": 12504,
                    "4": 9564,
                    "5": 7993,
                    "6": 6695,
                    "7": 5743,
                    "8": 5178,
                    "9": 4638
                }
            }
        }
    ]
}

The distribution dictionary contains the fraction of the total numbers that started with each digit. The counts dictionary contains the count of each number that started with each digit.

Self Testing

You can use randalyze to analyze the numbers it creates by piping the output of the generate command into the analyze command. For example, to check whether 10,000 generated numbers are wihin 10% of a Benford distribution, use:

randalyze generate benford -c 10000 | randalyze analyze -t 10 benford

Download Statistics

Project details

These details have not been verified by PyPI

Project links

Original Article

Release history Release notifications | RSS feed

This version

0.2.1

Mar 8, 2023

0.2.0

Feb 21, 2023

0.1.4

Mar 7, 2022

0.1.3

Feb 25, 2022

0.1.2

Feb 24, 2022

0.1.1

Feb 23, 2022

0.0.1

Feb 22, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

randalyze-0.2.1.tar.gz (22.7 kB view details)

Uploaded Mar 8, 2023 Source

Built Distribution

randalyze-0.2.1-py3-none-any.whl (14.1 kB view details)

Uploaded Mar 8, 2023 Python 3

File details

Details for the file randalyze-0.2.1.tar.gz.

File metadata

Download URL: randalyze-0.2.1.tar.gz
Upload date: Mar 8, 2023
Size: 22.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.10.4 Linux/5.19.0-35-generic

File hashes

Hashes for randalyze-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`cabf80b8faef8630f12315f5e5cfb772d5c8094e22560244dca3a959f0528dbc`
MD5	`967d395118d90510b8106dfd943bf630`
BLAKE2b-256	`297c0126287f0d888923fb81a00fe31faffddeb3eee15980dbf7a6838be65121`

See more details on using hashes here.

File details

Details for the file randalyze-0.2.1-py3-none-any.whl.

File metadata

Download URL: randalyze-0.2.1-py3-none-any.whl
Upload date: Mar 8, 2023
Size: 14.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.3.2 CPython/3.10.4 Linux/5.19.0-35-generic

File hashes

Hashes for randalyze-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2ae3b1059da4a61fa6963ee14b2e3d51c91d90a7fd40f9a539e9729cf5e780f2`
MD5	`dc327f76a5030d89e0e4eaf6a7caea45`
BLAKE2b-256	`e0a73c6098c7774a0ae34222f6eeaeb6d99ba0ef9d2c73e9bd061e7869e50cd0`

See more details on using hashes here.

randalyze 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Randalyze - Random Number Generator And Analyzer

What Does Randalyze Do?

Installation

Usage

Command Line Usage

Generating Random Numbers

Output Format

Plain Text

JSON

CSV

Analyzing Numbers

Ouput Format

Text Format

JSON

Self Testing

Download Statistics

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes