Skip to main content

No project description provided

Project description

build

Summary

Sample data generation is a common step used for testing and verifying new and existing features that make use of the data commons dictionary. Without validation tools, this step can be super hard and prone to errors. This project aims to provide tooling that helps with generating and visualizing sample data. It is dictionary agnostic, so should work for any given gdc compatible dictionary.

Sample data graphs are represented using a customized GraphML format which can be represented in either json or yaml files. This projects provides tools for creating this schema based on selected dictionary and validating data that is targeting this schema.

Goals

psqlgml aims to provide the following for projects that makes use of psqlgraph:

  1. test data validation and visualization

  2. test data schema that can be integrated with IDE’s for easier test data generation

  3. randomized test data generation based on user requirements

  4. provide data structures and functions for use in external projects

  5. provide alternate implementation for loading dictionary with better type checking

Requirements

  • Python3.6+

  • graphviz (used for visualization)

Installation

from pypi

$ pip install psqlgml

Quick Start

Command Line

# install
$ pip install psqlgml

# validate install
$ psqlgml --help

# generate internal schema to aid validation
$ psqlgml generate -v 2.4.0 -n test_dictionary

# validation
$ psqlgml validate --help

# visualize
$ psqlgml visualize --help

API

import psqlgml

# load the default dictionary
dictionary: psqlgml.Dictionary = psqlgml.load(version="2.3.0")

GML Schema

This is a customized GraphML format based on JSON schema. It allows graphs to be represented as a set of nodes and edges. The schema makes it possible to validate a sample data.

unique_field: node_id
nodes:
  - label: program
    node_id: p_1
    name: SM-KD
  - label: project
    node_id: pr_1
edges:
  - src: p_1
    dst: pr_1
    label: programs

This example creats two nodes Program and Project that are linked together using the node_id property. The name of the edge connecting them is programs

Schema Generation

psqlgml can be used to generate dictionary specific schemas using exposed command line scripts. By default, gdcdictionary is assumed but parameters can be updated to work with a different project.

Generate schema using version 2.4.0 of the gdcdictionary

psqlgml generate -v 2.4.0 -n gdcdictionary

The generated schema can be used for validating sample data. It can also be added to IDEs like PyCharm for intellisense while creating sample data.

Sample Data Validation

$ psqlgml validate -f sample.yaml --data-dir <resource dir> -d <dictionary name> -v <dictionary version>

The following validations are currently supported:

  • JSON Schema Validation

  • Duplicate Definition Validation

  • Undefined Link Validation

  • Association Validation

JSON Schema Validation

Checks the sample data is compliant with the dictionary. It validates things like: * properties that are not allowed on a node * property values not allowed on a property * Invalid enum value * Invalid/unsupported node types

Duplicate Definition Validation

Raises an error whenever a unique id is used for more than one node

Association Validation

Raises an error whenever an edge exists between nodes that the dictionary does not define an edge for.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psqlgml-0.2.4.tar.gz (564.2 kB view details)

Uploaded Source

Built Distribution

psqlgml-0.2.4-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file psqlgml-0.2.4.tar.gz.

File metadata

  • Download URL: psqlgml-0.2.4.tar.gz
  • Upload date:
  • Size: 564.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for psqlgml-0.2.4.tar.gz
Algorithm Hash digest
SHA256 e4284375b30959e4d300aa3968f043ca1649cb226f8507013fbd13792559afe4
MD5 e65919427f52eae5d3bfb356e04648d6
BLAKE2b-256 f85829bb78d1f0764cf29a02cf8d8091c548cea5625933568df68134141e89fe

See more details on using hashes here.

File details

Details for the file psqlgml-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: psqlgml-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for psqlgml-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 d3aa62b3aa1c6b971644228911088853134ddb8ddb36f0227eb93d7d8fe9d6a9
MD5 0b735f4be369b1f654794a9f3a951de2
BLAKE2b-256 9a03d90ab5f4450a06acdb7215063d931229c2bc4e19a0abfeb5b924e776025b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page