Skip to main content

A toolset for detecting inconsistencies in summary data.

Project description

scrutiPy v0.1.12: Scientific error detection in Python

A library for scientific error checking and fraud detection, based on the R Scrutiny library by Lukas Jung. Frontend API in Python 3, backend in Rust with PyO3 bindings.

Currently in early development. Presently available functions include:

grim_scalar(): Implements the GRIM test on single observations.

from scrutipy import grim_scalar

grim_scalar("5.19", 40)
# False

grim_map() Implements the GRIM test on Pandas dataframes. Use the variant grim_map_pl() for Polars dataframes. Both functions require Polars, which can be enabled using pip install scrutipy[polars] or pip install polars.

import pandas as pd
from scrutipy import grim_map 

df = pd.read_csv("data/pigs1.csv")
# it may be necessary to explicitly convert your x column to string type in order to avoid losing trailing zeros. In the event that trailing zeros may be lost, the function will throw a warning 
df["x"] = df["x"].astype(str) 
bools, errors = grim_map(df, 1, 2)

print(bools)
# list([True, False, False, False, False, True, False, True, False, False, True, False])

print(errors)
# None

grimmer() Implements the GRIMMER test on 1d iterables.

from scrutipy import grimmer
results = grimmer(["1.03", "52.13", "9.42375"], ["0.41", "2.26", "3.86"], [40, 30, 59], items = [1, 1, 1])

print(results)
# list(False, True, False) 

debit() implements the DEBIT test on 1d iterables (lists and arrays).

from scrutipy import debit

results = debit(["0.36", "0.11", "0.118974"], ["0.11", "0.31", "0.6784"], [20, 40, 100])
print(results)
# list([False, True, False])

debit_map() implements the DEBIT test on Pandas dataframes. Use the variant debit_map_pl() for Polars dataframes. Both functions require Polars, which can be enabled using pip install scrutipy[polars] or pip install polars.

from scrutipy import debit_map 

df = pd.read_csv("data/debit_data.csv")
df["xs"] = df["xs"].astype(str) # ensuring that these columns are string types to silence a warning
df["sds"] = df["sds"].astype(str) # it can also be silenced with silence_numeric_warning = True.
results, errors = debit_map(df, 1, 2, 3)

print(bools)
# list([True, True, True, False, True, True, True])

print(errors)
# None

closure(): Implements the CLOSURE algorithm for recovering integer data from summary statistics. Any data which can be represented as integers on a bounded range, such as Likert scores, can be provably reconstructed using the mean, standard deviation, count, and range. This function replaces the CORVIDS algorithm, which relied on more advanced mathematics packages, with a simpler and faster algorithm. Note that even with CLOSURE's performance gains, the necessary time and compute to reconstruct data increases rapidly as range and count increase.

# reconstruct possible datasets with a mean of 3.5, sd of 0.57, n = 100, 
# and inclusive range from 0 to 7. 
# We set the rounding error for the mean to 0.05 and for sd to 0.005

from scrutipy import closure
results = closure(3.5, 1.2, 50, 0, 7, 0.05, 0.005) 

len(results)
# 7980 
# indicates there are 7980 possible datasets with these characteristics.

calculate_snspn(): Calculates all possible confusion matries which could be produced from a sample size, and compares the calculated sensitivity and specificity to the input values. It returns a list of dictionaries containing the records for each possibility, as well as a total error and whether the total error is less than a certain tolerance. The dictionaries are ordered from least to greatest total error. For larger sample sizes, it is recommended to use a top_n argument to limit the number of returned values. The return can be trivially turned into a pandas or polars dataframe as seen below. This is based on an application by Rod Whitely.

import pandas as pd
import scrutipy as s
vals = s.calculate_snspn(0.8, 0.70588, 20, top_n=5)
df = pd.DataFrame(vals)
df
   TP  TN  FP  FN  Calculated_Sensitivity  Calculated_Specificity  Sensitivity_Error  Specificity_Error  Total_Error  Exact_Match
0   8   7   3   2                0.800000                0.700000           0.000000           0.005880     0.005880        False
1   4  11   4   1                0.800000                0.733333           0.000000           0.027453     0.027453        False
2  10   5   2   3                0.769231                0.714286           0.030769           0.008406     0.039175        False
3   4  10   5   1                0.800000                0.666667           0.000000           0.039213     0.039213        False
4   5  10   4   1                0.833333                0.714286           0.033333           0.008406     0.041739        False

It is also recommended to use the n_positive argument (previously called n_pathology), which limits the search range only to those sets where the number of true positives and false negatives equal the input value, if this information is available.

vals = s.calculate_snspn(0.8, 0.70588, 20, n_positive=10, top_n=5)
df = pd.DataFrame(vals)
df
   TP  TN  FP  FN  Calculated_Sensitivity  Calculated_Specificity  Sensitivity_Error  Specificity_Error  Total_Error  Exact_Match
0   8   7   3   2                     0.8                     0.7                0.0            0.00588      0.00588        False
1   8   8   2   2                     0.8                     0.8                0.0            0.09412      0.09412        False
2   8   6   4   2                     0.8                     0.6                0.0            0.10588      0.10588        False
3   9   7   3   1                     0.9                     0.7                0.1            0.00588      0.10588        False
4   7   7   3   3                     0.7                     0.7                0.1            0.00588      0.10588        False

calculate_ppvnpv(): Calculates all possible confusion matries which could be produced from a sample size, and compares the calculated PPV and NPV to the input values. See calculate_snspn() above for some other details of recommended use for this family of functions.

>>> import pandas as pd
>>> import scrutipy as s
>>> vals = s.calculate_ppvnpv(0.8, 0.70588, 20, top_n=5)
>>> df = pd.DataFrame(vals)
>>> df
   TP  TN  FP  FN  Calculated_PPV  Calculated_NPV  PPV_Error  NPV_Error  Total_Error  Exact_Match
0   8   7   2   3        0.800000        0.700000   0.000000   0.005880     0.005880        False
1   4  11   1   4        0.800000        0.733333   0.000000   0.027453     0.027453        False
2  10   5   3   2        0.769231        0.714286   0.030769   0.008406     0.039175        False
3   4  10   1   5        0.800000        0.666667   0.000000   0.039213     0.039213        False
4   5  10   1   4        0.833333        0.714286   0.033333   0.008406     0.041739        False

calculate_likelihoodratios(): Calculates all possible confusion matries which could be produced from a sample size, and compares the calculated likelihood ratios to the input values. See calculate_snspn() above for some other details of recommended use for this family of functions.

l = s.calculate_likelihoodratios(0.234, 0.687, 56, top_n = 5)
df = pd.DataFrame(l)
df
   TP  TN  FP  FN  Calculated_PLR  Calculated_NLR  PLR_Error  NLR_Error  Total_Error  Exact_Match
0  20   9  22   5        0.232258        0.688889   0.001742   0.001889     0.003631        False
1  12  12  29   3        0.234146        0.683333   0.000146   0.003667     0.003813        False
2   4  15  36   1        0.235294        0.680000   0.001294   0.007000     0.008294        False
3  31   5  12   8        0.233786        0.697436   0.000214   0.010436     0.010650        False
4  23   8  19   6        0.234994        0.698276   0.000994   0.011276     0.012269        False

calculate_metrics_from_counts(): Calculates sensitivity, specificity, PPV, NPV, Positive Likelihood Ratio and Negative Likelihood Ratio from input counts of true/false positives/negatives.

import pandas as pd
import scrutipy as s
l = s.calculate_metrics_from_counts(34, 88, 94, 234)
df = pd.DataFrame([l])
   Sensitivity  Specificity       PPV       NPV       +LR       -LR
0     0.126866     0.483516  0.265625  0.273292  0.245634  1.805801

simrank() and simrank_parallel(): outputs sampled rank groups and U-values. Implementation by David Robert Grimes, cf Heathers & Grimes 2026

import scrutipy as s
res = s.simrank(10, 12, 7)
print("Group 1 ranks: ", res[0])
print("Group 2 ranks: ", res[1])
print("U-value: ", res[2])

Roadmap

Expand documentation

Test and document user-side GRIMMER function

Tidy up return types as dataframes

Implicitly maintain x_col as str when appropriate

Implement SPRITE

Acknowledgements

Lukas Jung

Nick Brown

James Heathers

Jordan Anaya

Aurelien Allard

Rod Whitely

David Robert Grimes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrutipy-0.1.12.tar.gz (60.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

scrutipy-0.1.12-cp313-cp313-win_amd64.whl (4.7 MB view details)

Uploaded CPython 3.13Windows x86-64

scrutipy-0.1.12-cp313-cp313-manylinux_2_28_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

scrutipy-0.1.12-cp313-cp313-macosx_11_0_arm64.whl (3.8 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

scrutipy-0.1.12-cp312-cp312-win_amd64.whl (4.7 MB view details)

Uploaded CPython 3.12Windows x86-64

scrutipy-0.1.12-cp312-cp312-manylinux_2_28_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

scrutipy-0.1.12-cp312-cp312-macosx_11_0_arm64.whl (3.8 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

scrutipy-0.1.12-cp311-cp311-win_amd64.whl (4.7 MB view details)

Uploaded CPython 3.11Windows x86-64

scrutipy-0.1.12-cp311-cp311-manylinux_2_28_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

scrutipy-0.1.12-cp311-cp311-macosx_11_0_arm64.whl (3.8 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

scrutipy-0.1.12-cp310-cp310-win_amd64.whl (4.7 MB view details)

Uploaded CPython 3.10Windows x86-64

scrutipy-0.1.12-cp310-cp310-manylinux_2_28_x86_64.whl (4.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

scrutipy-0.1.12-cp310-cp310-macosx_11_0_arm64.whl (3.8 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file scrutipy-0.1.12.tar.gz.

File metadata

  • Download URL: scrutipy-0.1.12.tar.gz
  • Upload date:
  • Size: 60.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrutipy-0.1.12.tar.gz
Algorithm Hash digest
SHA256 6408aa9f517450b87bc6e68efc665e9a0b9b64a56faa2a904574d26b8373cdeb
MD5 523404b83ac20e3247d6a96f35b73f43
BLAKE2b-256 e9684aa88450f859ffc4e5b930e7b5dc0789cb2e5ab1f461e1c802e4e0baa56c

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12.tar.gz:

Publisher: mac.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrutipy-0.1.12-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: scrutipy-0.1.12-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 4.7 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrutipy-0.1.12-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 585421327ece404b758e65c8f9c3a29e1f2de3742ae88c250bff8cb4e19cc40b
MD5 e8be6a55d52346e5b51f04324e43b0ab
BLAKE2b-256 41512691fe90a27bb60aa5d4cae01ab6da82301404d66fb3b6d1e331e1b43a66

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12-cp313-cp313-win_amd64.whl:

Publisher: windows.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrutipy-0.1.12-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for scrutipy-0.1.12-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 53fbd02976d58c281a99a4c23fc0356e9559b48dbff03fb7ee0f5614f51afa68
MD5 96b2915cd63c02cbff4aff871870c065
BLAKE2b-256 61e38a27233c5d93a12663e2eacc5e266065333191de991a069e7c190018b84e

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12-cp313-cp313-manylinux_2_28_x86_64.whl:

Publisher: ubuntu.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrutipy-0.1.12-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scrutipy-0.1.12-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0a9867dfe8ff05b10874a184632163807779d266f58f3ab2ef438b03447957de
MD5 730f963f1880c0cfdbae5fae1ffc7bd4
BLAKE2b-256 f2f04a28460039ce8359db5bf56232d2915eee89f002390090f30b6546641d98

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: mac.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrutipy-0.1.12-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: scrutipy-0.1.12-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 4.7 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrutipy-0.1.12-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 d174ae8cf027e8ea58542f85db46fa433b3926718ad1983ffd87804dbf4c047c
MD5 2c202660af78a3e659f2fea8b712e00d
BLAKE2b-256 ca69371c2f60a025888fbc96f9dca7f1157de1104744afd8f2b2158fb4d96fa3

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12-cp312-cp312-win_amd64.whl:

Publisher: windows.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrutipy-0.1.12-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for scrutipy-0.1.12-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1d29e179c9e1e978949679816000e9bb94b11fc97fc7656f83c8094f0c7d3a9e
MD5 2fc702de3cb588add0f4b45f0c5793b1
BLAKE2b-256 3a31fd7d0098330726d8a9a1c5d8cb5fd6be3a95aff03c86757c2f0ba17e2fc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: ubuntu.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrutipy-0.1.12-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scrutipy-0.1.12-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 db5b82e8483794434c60e9badc96545c0aea66b2a7f56972e51c0d81842206bb
MD5 8dde19003e333871042bf768565653f6
BLAKE2b-256 4ab6fd89482e88c8536bcf9cc10b65a4808aa6439e40c14880b070b9d401c3df

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: mac.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrutipy-0.1.12-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: scrutipy-0.1.12-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 4.7 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrutipy-0.1.12-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 b426dd43eee3ff629f1c7cf64f9ab42a730438b00db9287ffd88eebf2223c059
MD5 828ce990e01309216bdaa3c5e03aa3a1
BLAKE2b-256 2310481efe7108c152bb25c65dd3299c4d8a2699ffdd991961634cb9ca4dcff7

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12-cp311-cp311-win_amd64.whl:

Publisher: windows.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrutipy-0.1.12-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for scrutipy-0.1.12-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2416974c0c4159668508c0599ef8a24f2f0ad8d310c8fe7c713d310801e9991b
MD5 e6493e0b76b3cfec040d7921a0411d44
BLAKE2b-256 0ac5df7fb8c771bc7441078c28b4eb6076bfb5a3efb1e2e85361e4ca3b35ddbd

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: ubuntu.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrutipy-0.1.12-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scrutipy-0.1.12-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4bc8196b53ed1199a0bccedad922f670b22cf7cde54f1a5d73adbb7e20a6c760
MD5 04d65be50bf8123cf59ca8982bdb9644
BLAKE2b-256 60a20e60c2293a7fc2fd9bfa696f0f55dd58cf6c45e60fd3e6c79869abfb9510

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: mac.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrutipy-0.1.12-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: scrutipy-0.1.12-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 4.7 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scrutipy-0.1.12-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 cd8330ea07aee084a23c0c29a0e52d0d4ab754ab2e3b6f7ed23be6b2101d80ec
MD5 ea0e920c7fbf5e1ad13a4e0af409c861
BLAKE2b-256 0831036b5fcf3ba6e0b1adebf7b6b5dbbf21b49e6820a25a08f14e039fcb6ec1

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12-cp310-cp310-win_amd64.whl:

Publisher: windows.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrutipy-0.1.12-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for scrutipy-0.1.12-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a6a05b1402d00400ba537f73057481c4819c08a1e4d8115b6de63b080397035a
MD5 df64dbc53913c0cb6171784fcacc6c3e
BLAKE2b-256 040143b856e89a33897f2c86cb37fcf619ee9d371563b83011af14d84b0204a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12-cp310-cp310-manylinux_2_28_x86_64.whl:

Publisher: ubuntu.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scrutipy-0.1.12-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for scrutipy-0.1.12-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 81c4c7d8ae23b8099a46720c51797d0cd608b4bbc7dc41b88a1ca4e43f1e813e
MD5 cb765ad5f551b5830f2b0156838be51c
BLAKE2b-256 a8faa394bc42e9831834ad8025051c12fd672fd257bbd926c895068263272d2b

See more details on using hashes here.

Provenance

The following attestation bundles were made for scrutipy-0.1.12-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: mac.yml on nrposner/scrutipy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page