A digit doctoring detection package

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 1 - Planning
License
- OSI Approved :: GNU General Public License v2 or later (GPLv2+)
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering

Project description

DrDigit

DrDigit is a digit doctoring detection package at an early stage. Interested in contributing? Please feel free to contact me, e.g. by commenting on the issue "Contributors welcome!" at https://github.com/brezniczky/drdigit/issues/1.

Requirements

DrDigit requires Python 3.5 or later.

Concept

The tests are based on the statistics of digits which are assumed to have a uniform distribution. Near-uniform distributions can be obtained by looking at the last digits of sufficiently large values - such as vote counts (possibly above 100).

On a smaller scale, you can query for the probablity of a digit sequence using probability mass functions represented by Python functions.

There are larger scale tests for a sequence of digit groups. This is so to support situations where different groups are expected to be doctored by different people - testing for an overarching, consistent anomaly could be too strict in such cases.

Based on the current features (entropy, digit repetition, coincident digits in parallel sequences), it is possible to sort a data frame containing digit groups by probability, so then it is possible to inspect if there is any apparent sanity behind the doctoring.

A couple of hints

Handle results with care, there is always some uncertainity
Try to focus on interesting groups, this should yield much sharper results
When committing Kaggle scripts, switch off the on-disk caching of tests before committing, e.g. via
```
import drdigit as drd
drd.set_option(physical_cache_path="")
```
You can find more about it via help(drd.set_option).

Quick start

DrDigit can be installed using pip:

$ pip install drdigit-brezniczky
$ ipython

Digit entropy behaves a little weirdly when different digit sequence lengths are considered - isn't the sequence 1, 2 as diverse as possible?

Python 3.5.2 (default, Nov 12 2018, 13:43:14)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.7.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import drdigit as drd

In [2]: help(drd)

In [3]: print(drd.get_entropy([1, 2]))                                                                                                       
0.6931471805599453

In [4]: print(drd.get_entropy([1, 1, 2, 2]))                                                                                                 
0.6931471805599453

Probabilities are often more suited for a comparison:

In [6]: drd.prob_of_entr(2, drd.get_entropy([1, 2]))                                                                                   
cdf for 2 was generated
Out[6]: 1.0

In [7]: drd.prob_of_entr(4, drd.get_entropy([1, 1, 2, 2]))                                                                                   
cdf for 4 was generated
Out[7]: 0.0624

Indeed, the latter sequence is unusually repetitive.

More examples to follow, for now you can have a look at the Kaggle notebook at https://www.kaggle.com/brezniczky/poland-2019-ep-elections-doctoring-quick-check or around https://github.com/brezniczky/ep_elections_2019_hun/blob/master/PL/ for instance in the process_data.py file.

Some complicated (and - sorry, sometimes unreliabe/slightly outdated) details about the considerations/methodology and future ideas can be found in the Hungarian elections document

Tests

The few tests that there are can be run by pytest.

For this, I would just use virtualenvwrapper and do something akin to

$ mkvirtualenv drdigit_test
$ pip install -r requirements/requirements_test.txt
$ pytest

from the directory of the drdigit clone.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 1 - Planning
License
- OSI Approved :: GNU General Public License v2 or later (GPLv2+)
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

This version

0.0.17

Sep 16, 2019

0.0.16

Sep 16, 2019

0.0.15

Sep 15, 2019

0.0.14

Sep 14, 2019

0.0.13

Sep 13, 2019

0.0.12

Aug 31, 2019

0.0.11

Aug 29, 2019

0.0.10

Aug 29, 2019

0.0.9

Aug 29, 2019

0.0.8

Aug 26, 2019

0.0.7

Aug 23, 2019

0.0.6

Aug 23, 2019

0.0.5

Aug 22, 2019

0.0.4

Aug 22, 2019

0.0.3

Aug 21, 2019

0.0.2

Aug 21, 2019

0.0.1

Aug 21, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

drdigit-brezniczky-0.0.17.tar.gz (26.2 kB view details)

Uploaded Sep 16, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

drdigit_brezniczky-0.0.17-py3-none-any.whl (41.9 kB view details)

Uploaded Sep 16, 2019 Python 3

File details

Details for the file drdigit-brezniczky-0.0.17.tar.gz.

File metadata

Download URL: drdigit-brezniczky-0.0.17.tar.gz
Upload date: Sep 16, 2019
Size: 26.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.1.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.5.2

File hashes

Hashes for drdigit-brezniczky-0.0.17.tar.gz
Algorithm	Hash digest
SHA256	`cb0f01994a0e67f816658fc4e1a497ecee8a704d0d5b90519f828f9b3160d771`
MD5	`67bd6f1a98bb83518b6080f72234926e`
BLAKE2b-256	`9d35d1d043148b75334a158c7f8ade825883007bb8c499957d16dd3fb853ece0`

See more details on using hashes here.

File details

Details for the file drdigit_brezniczky-0.0.17-py3-none-any.whl.

File metadata

Download URL: drdigit_brezniczky-0.0.17-py3-none-any.whl
Upload date: Sep 16, 2019
Size: 41.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.1.0 requests-toolbelt/0.9.1 tqdm/4.34.0 CPython/3.5.2

File hashes

Hashes for drdigit_brezniczky-0.0.17-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db2801c1863cc4a9eb7ab7125e5e841c4c3542efec69714b2a8b3127659d6a16`
MD5	`56a06cfa53788e5359b45fa14b62f36a`
BLAKE2b-256	`bd9d600e72e76cdd7ef3d7185f026669ad46f05a5eb00b872c492e0ebeb6b0b0`

See more details on using hashes here.

drdigit-brezniczky 0.0.17

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DrDigit

Requirements

Concept

A couple of hints

Quick start

Tests

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes