Skip to main content

Command-line program that scans the NMDC MongoDB database for referential integrity violations

Project description

refscan

refscan is a command-line tool people can use to scan the NMDC MongoDB database for referential integrity violations.

%% This is the source code of a Mermaid diagram, which GitHub will render as a diagram.
%% Note: PyPI does not render Mermaid diagrams, and instead displays their source code.
%%       Reference: https://github.com/pypi/warehouse/issues/13083
graph LR
    schema[LinkML<br>schema]
    database[(MongoDB<br>database)]
    script[["refscan.py"]]
    violations["List of<br>violations"]
    references["List of<br>references"]:::dashed_border
    schema --> script
    database --> script
    script -.-> references
    script --> violations
    
    classDef dashed_border stroke-dasharray: 5 5

Assumptions

refscan was designed under some assumptions about the schema and database, including:

  1. Each source document (i.e. document containing references) has a field named type, whose value (a string) is the class_uri of the schema class of which the document represents an instance. For example, the type field of each document in the study_set collection has the value "nmdc:Study".

Development status

refscan is in early development and its author does not recommend anyone use it for anything without reviewing its code first.

Tips

refscan requires the user to specify the path to a schema in YAML format. If you have curl installed, you can download a YAML file from GitHub by running the following command (after replacing the {...} placeholders and customizing the path):

# Download the raw content of https://github.com/{user_or_org}/{repo}/blob/{branch}/path/to/schema.yaml
curl -o schema.yaml https://raw.githubusercontent.com/{user_or_org}/{repo}/{branch}/path/to/schema.yaml

For example:

# Download the raw content of https://github.com/microbiomedata/berkeley-schema-fy24/blob/main/nmdc_schema/nmdc_materialized_patterns.yaml
curl -o schema.yaml https://raw.githubusercontent.com/microbiomedata/berkeley-schema-fy24/main/nmdc_schema/nmdc_materialized_patterns.yaml

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refscan-0.1.0.tar.gz (11.6 kB view hashes)

Uploaded Source

Built Distribution

refscan-0.1.0-py3-none-any.whl (13.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page