Skip to main content

Shared module for validating the ruleset on the SSDA903 census using DfE rules.

Project description

Quality LAC data beta: Python validator

Build & Test PyPI version Run on Repl.it

We want to build a tool that improves the quality of data on Looked After Children so that Children’s Services Departments have all the information needed to enhance their services.

We believe that a tool that highlights and helps fixing data errors would be valuable for:

  1. Reducing the time analysts, business support and social workers spend cleaning data.
  2. Enabling leadership to better use evidence in supporting Looked After Children.

About this project

The aim of this project is to deliver a tool to relieve some of the pain-points of reporting and quality in children's services data. This project focuses, in particular, on data on looked after children (LAC) and the SSDA903 return.

The project consists of a number of related pieces of work:

The core parts consist of a Python validator engine and rules using Pandas with Poetry for dependency management. The tool is targeted to run either standalone, or in pyodide in the browser for a zero-install deployment with offline capabilities.

It provides methods of finding the validation errors defined by the DfE in 903 data. The validator needs to be provided with a set of input files for the current year and, optionally, the previous year. These files are coerced into a common format and sent to each of the validator rules in turn. The validators report on rows not meeting the rules and a report is provided highlight errors for each row and which fields were included in the checks.

Data pipeline

  • Loading of files
  • Identification of tables - currently matched on exact filename
  • Conversion of CSV to tabular format - no type checking
  • Enrichment of provided data with Postcode distances
  • Evaluation of rules
  • Report

Project Structure

These are the key files

project
├─── pyproject.toml           - Project details and dependencies
├─── validator903
│    ├─── config.py           - High-level configuration
│    ├─── ingress.py          - Data ingress (handling CSV and XML files)
│    ├─── types.py            - Classes used across the work
│    ├─── validator.py        - The core validator process
│    └─── validators.py       - All individual validator codes
└─── tests                    - Unit tests

Most of the work from contributors will be in validators.py and the associated testing files under tests. Please do not submit a pull-request without a comprehensive test.

Development

To install the code and dependencies, from the main project directory run:

poetry install

If this does not work, it might be because you're running the wrong version of Python, the version of Numpy used by the 903 validator is locked at 3.9. The devcontainer and dockerfile should ensure you are running 3.9 and you may simply require a rebuild. If not, ensure you are working in an environment or venv with Python 3.9 as your interpreter.

Adding validators

Validators are simple functions, usually called validate_XXX() which take no arguments and return a tuple of an ErrorDefinition and a test function. The test function itself takes a single argument, the datastore, which is a Mapping (a dict-like) following the structure below.

The following is the expected structure for the input data that is given to each validator (the dfs object). You should assume that not all of these keys are present and handle that appropriately.

Any XML uploads are converted into CSV form to give the same inputs.

{
    # This years data
    'Header':   # header dataframe
    'Episodes': # episodes dataframe
    'Reviews':  # reviews dataframe
    'UASC':     # UASC dataframe
    'OC2':      # OC2 dataframe
    'OC3':      # OC3 dataframe
    'AD1':      # AD1 dataframe
    'PlacedAdoption':  # Placed for adoption dataframe
    'PrevPerm': # Previous permanence dataframe
    'Missing':  # Missing dataframe
    # Last years data
    'Header_last':   # header dataframe
    'Episodes_last': # episodes dataframe
    'Reviews_last':  # reviews dataframe
    'UASC_last':     # UASC dataframe
    'OC2_last':      # OC2 dataframe
    'OC3_last':      # OC3 dataframe
    'AD1_last':      # AD1 dataframe
    'PlacedAdoption_last':  # Placed for adoption dataframe
    'PrevPerm_last': # Previous permanence dataframe
    'Missing_last':  # Missing dataframe
    # Metadata
    'metadata': {
        'collection_start': # A datetime with the collection start date (year/4/1)
        'collection_end':   # A datetime with the collection end date (year + 1/4/1)
        'postcodes':        # Postcodes dataframe, columns laua, oseast1m, osnrth1m, pcd
        'localAuthority:    # The local authority code entered (long form, e.g. E07000026)
        'collectionYear':   # The raw collection year string - unlikely to need this (e.g. '2019/20')
    }
}

Releases

To build and release a new version, make sure all your unit tests pass.

We use semantic versioning, so update the project version in pyproject.toml accordingly and commit, creating a PR. Once the release version is on GitHub, create a GitHub release naming the release with the current release name, e.g. 1.0 and the tag with the release name prefixed with a v, i.e. v1.0. Alpha and beta releases can be flagged by appending -alpha.<number> and -beta.<number>.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quality_lac_data_validator-1.0.3.tar.gz (70.8 kB view details)

Uploaded Source

Built Distribution

quality_lac_data_validator-1.0.3-py3-none-any.whl (71.0 kB view details)

Uploaded Python 3

File details

Details for the file quality_lac_data_validator-1.0.3.tar.gz.

File metadata

  • Download URL: quality_lac_data_validator-1.0.3.tar.gz
  • Upload date:
  • Size: 70.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.8.16 Linux/5.15.0-1031-azure

File hashes

Hashes for quality_lac_data_validator-1.0.3.tar.gz
Algorithm Hash digest
SHA256 5aec4454cd06ae137643c2d489c8ca31daaf65c135f2c89df933c4be7871fe59
MD5 b269dda0d6170807b5edd52dae464d3c
BLAKE2b-256 f60857e8b0f3a9de04e8add9a7701136c0a75a7d73b0b7ada530d5ed4aeab7e2

See more details on using hashes here.

File details

Details for the file quality_lac_data_validator-1.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for quality_lac_data_validator-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ad37109238e3921e8ae77bc8a7d18b859ed8fd9e7427872b98191959f19a2349
MD5 83fdb94c09f5dd88655390f6edf20a00
BLAKE2b-256 bae93b7a811fadb3367b25b22ae9745f92550e995e3a56c44372cc1c0d79327f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page