No project description provided
Project description
Summary
Sample data generation is a common step used for testing and verifying new and existing features that make use of the data commons dictionary. Without validation tools, this step can be super hard and prone to errors. This project aims to provide tooling that helps with generating and visualizing sample data. It is dictionary agnostic, so should work for any given gdc compatible dictionary.
Sample data graphs are represented using a customized GraphML format which can be represented in either json or yaml files. This projects provides tools for creating this schema based on selected dictionary and validating data that is targeting this schema.
Goals
psqlgml aims to provide the following for projects that makes use of psqlgraph:
test data validation and visualization
test data schema that can be integrated with IDE’s for easier test data generation
randomized test data generation based on user requirements
provide data structures and functions for use in external projects
provide alternate implementation for loading dictionary with better type checking
Requirements
Python3.6+
graphviz (used for visualization)
Installation
from pypi
$ pip install psqlgml
Quick Start
Command Line
# install
$ pip install psqlgml
# validate install
$ psqlgml --help
# generate internal schema to aid validation
$ psqlgml generate -v 2.4.0 -n test_dictionary
# validation
$ psqlgml validate --help
# visualize
$ psqlgml visualize --help
API
import psqlgml
# load the default dictionary
dictionary: psqlgml.Dictionary = psqlgml.load(version="2.3.0")
GML Schema
This is a customized GraphML format based on JSON schema. It allows graphs to be represented as a set of nodes and edges. The schema makes it possible to validate a sample data.
unique_field: node_id
nodes:
- label: program
node_id: p_1
name: SM-KD
- label: project
node_id: pr_1
edges:
- src: p_1
dst: pr_1
label: programs
This example creats two nodes Program and Project that are linked together using the node_id property. The name of the edge connecting them is programs
Schema Generation
psqlgml can be used to generate dictionary specific schemas using exposed command line scripts. By default, gdcdictionary is assumed but parameters can be updated to work with a different project.
Generate schema using version 2.4.0 of the gdcdictionary
psqlgml generate -v 2.4.0 -n gdcdictionary
The generated schema can be used for validating sample data. It can also be added to IDEs like PyCharm for intellisense while creating sample data.
Sample Data Validation
$ psqlgml validate -f sample.yaml --data-dir <resource dir> -d <dictionary name> -v <dictionary version>
The following validations are currently supported:
JSON Schema Validation
Duplicate Definition Validation
Undefined Link Validation
Association Validation
JSON Schema Validation
Checks the sample data is compliant with the dictionary. It validates things like: * properties that are not allowed on a node * property values not allowed on a property * Invalid enum value * Invalid/unsupported node types
Duplicate Definition Validation
Raises an error whenever a unique id is used for more than one node
Undefined Link Validation
This is raised as a warning, since it is very possible to link to nodes not defined with the sample data. For example, appending data to an existing database.
Association Validation
Raises an error whenever an edge exists between nodes that the dictionary does not define an edge for.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file psqlgml-0.2.4.tar.gz
.
File metadata
- Download URL: psqlgml-0.2.4.tar.gz
- Upload date:
- Size: 564.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4284375b30959e4d300aa3968f043ca1649cb226f8507013fbd13792559afe4 |
|
MD5 | e65919427f52eae5d3bfb356e04648d6 |
|
BLAKE2b-256 | f85829bb78d1f0764cf29a02cf8d8091c548cea5625933568df68134141e89fe |
File details
Details for the file psqlgml-0.2.4-py3-none-any.whl
.
File metadata
- Download URL: psqlgml-0.2.4-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3aa62b3aa1c6b971644228911088853134ddb8ddb36f0227eb93d7d8fe9d6a9 |
|
MD5 | 0b735f4be369b1f654794a9f3a951de2 |
|
BLAKE2b-256 | 9a03d90ab5f4450a06acdb7215063d931229c2bc4e19a0abfeb5b924e776025b |