Skip to main content

Python wrapper and metaschema for datadictionary.

Project description

dictionaryutils

python wrapper and metaschema for datadictionary. It can be used to:

  • load a local dictionary to a python object.
  • dump schemas to a file that can be uploaded to s3 as an artifact.
  • load schema file from an url to a python object that can be used by services

Test for dictionary validity with Docker

Say you have a dictionary you are building locally and you want to see if it will pass the tests.

You can add a simple alias to your .bash_profile to enable a quick test command:

testdict() { docker run --rm -v $(pwd):/dictionary quay.io/cdis/dictionaryutils:master; }

Then from the directory containing the gdcdictionary directory run testdict.

Generate simulated data with Docker

If you wish to generate fake simulated data you can also do that with dictionaryutils and the data-simulator.

simdata() { docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/sh -c "cd /dictionary && python setup.py install --force; python /src/datasimulator/bin/data-simulator simulate --path /simdata/ $*; export SUCCESS=$?; rm -rf build dictionaryutils dist gdcdictionary.egg-info; chmod -R a+rwX /simdata; exit $SUCCESS"; }
simdataurl() { docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/sh -c "python /src/datasimulator/bin/data-simulator simulate --path /simdata/ $*; chmod -R a+rwX /simdata"; }

Then from the directory containing the gdcdictionary directory run simdata and a folder will be created called simdata with the results of the simulator run. You can also pass in additional arguments to the data-simulator script such as simdata --max_samples 10.

The --max_samples argument will define a default number of nodes to simulate, but you can override it using the --node_num_instances_file argument. For example, if you create the following instances.json:

{
        "case": 100,
        "demographic": 100
}

Then run the following:

docker run --rm -v $(pwd):/dictionary -v $(pwd)/simdata:/simdata quay.io/cdis/dictionaryutils:master /bin/sh -c "cd /dictionary && python setup.py install --force; python /src/datasimulator/bin/data-simulator simulate --path /simdata/ --program workshop --project project1 --max_samples 10 --node_num_instances_file instances.json; export SUCCESS=$?; rm -rf build dictionaryutils dist gdcdictionary.egg-info; chmod -R a+rwX /simdata; exit $SUCCESS";

Then you'll get 100 each of case and demographic nodes and 10 each of everything else. Note that the above example also defines program and project names.

You can also run the simulator for an arbitrary json url by using simdataurl --url https://datacommons.example.com/schema.json.

Use dictionaryutils to load a dictionary

from dictionaryutils import DataDictionary

dict_fetch_from_remote = DataDictionary(url=URL_FOR_THE_JSON)

dict_loaded_locally = DataDictionary(root_dir=PATH_TO_SCHEMA_DIR)

Use dictionaryutils to dump a dictionary

import json
from dictionaryutils import dump_schemas_from_dir

with open('dump.json', 'w') as f:
    json.dump(dump_schemas_from_dir('../datadictionary/gdcdictionary/schemas/'), f)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dictionaryutils-3.4.11.tar.gz (14.3 kB view details)

Uploaded Source

File details

Details for the file dictionaryutils-3.4.11.tar.gz.

File metadata

  • Download URL: dictionaryutils-3.4.11.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.9.18 Linux/6.8.0-1014-gcp

File hashes

Hashes for dictionaryutils-3.4.11.tar.gz
Algorithm Hash digest
SHA256 a63ae34b4c0130cd94e4ca685cbb18155c225f9bface32ed1d503393d0dd39cf
MD5 ae02272bfd77af67d5627586293b43f3
BLAKE2b-256 5421a0f43f201ee5a8a085534671d6e87df5786d676464a7a931a3373d43b977

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page