OCSF Schema Validation
Project description
OCSF Schema Validator
A utility to validate contributions to the OCSF schema, intended to prevent human error when contributing to the schema in order to keep the schema machine-readable.
OCSF provides several include mechanisms to facilitate reuse, but this means individual schema files may be incomplete. This complicates using off-the-shelf schema definition tools for validation.
Query is a federated search solution that normalizes disparate security data to OCSF. This validator is adapted from active code and documentation generation tools written by the Query team.
Supported Validations
The validator can currently perform the following validations:
- All required keys are present
- There are no unrecognized keys
- Dependency targets are resolvable and exist
- All attributes in
dictionary.json
are used - There are no redundant
profiles
and$include
targets - There are no name collisions within record types
- All attributes are defined in
dictionary.json
Planned Validations
In the future, this validation should also ensure the following:
- The contents of
categories.json
match the directory structure of/events
- There are no unused enums
- There are no unused profiles
- There are no unused imports
- There are no name collisions between extensions
- There are no name collisions between objects and events
Running the validator
- Install the validator using
pip
orpoetry
. (well, once we're publishing it...) - Clone a copy of the OCSF schema, if you don't already have one.
- Invoke the validator with the location of your copy of the OCSF schema.
poetry run python -m ocsf_validator <schema_path>
Technical Overview
The OCSF metaschema is represented as record types by filepath, achieved as follows:
- Record types are represented using Python's type system by defining them as Python
TypedDict
s intypes.py
. This allows the validator to take advantage of Python's reflection capabilities. - Files and record types are associated by pattern matching the file paths. These patterns are named in
matchers.py
to allow mistakes to be caught by a type checker. - Types are mapped to filepath patterns in
type_mapping.py
.
The contents of the OCSF schema to be validated are primarily represented as a Reader
defined in reader.py
. Reader
s load the schema definitions to be validated from a source (usually from a filesystem) and contain them without judgement. The process_includes
function and other contents of processor.py
mutate the contents of a Reader
by applying OCSF's various include mechanisms.
Validators are defined in validators.py
and test the schema contents for various problematic conditions. Validators should pass Exception
s to a special error Collector
defined in errors.py
. This module also defines a number of custom exception types that represent problematic schema states. The Collector
raises errors by default, but can also hold them until they're aggregated by a larger validation process (e.g., the ValidationRunner
).
The ValidationRunner
combines all of the building blocks above to read a proposed schema from a filesystem, validate the schema, and provide useful output and a non-zero exit code if any errors were encountered.
Contributing
After checking out, you'll want to install dependencies:
poetry install
Before committing, run the formatters and tests:
poetry run isort
poetry run black
poetry run pyright
poetry run pytest
If you're adding a validator, do the following:
- Write your
validate_
function invalidate.py
to apply a function to the relevant keys in a reader that will run your desired validation. Seevalidators.py
for examples. - Add any custom errors in
errors.py
. - Create an option to change its severity level in
ValidatorOptions
and map it in the constructor ofValidationRunner
inrunner.py
. - Invoke the new validator in
ValidationRunner.validate
.
TODO
There is still plenty to be done!
General
- Add CLI arguments for everything in ValidatorOptions
- Add more validators.
- Are things named consistently across (and within) modules?
- Inline documentation could be better.
- This README could be better.
- Shell script to run tests and formatters.
- Clean up * imports, especially in
__init__.py
. - Consider any imports in
__init__.py
that could be package-protected.
Pipeline
- Action for this repository to run formatters and tests on PRs.
- Add a coverage report.
- Action for this repository to publish to PyPi.
- Action for the OCSF Schema repository to run the validation runner on PRs.
Testing
- Unit tests for TypeMapping
- Test coverage could be a lot better in general
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ocsf_validator-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e7f3a6a36cca42ae5cff408e4d51f76f05b756aea53b30c02735eb357489d3d |
|
MD5 | 8da0fdd6747085853c87086c16201e8a |
|
BLAKE2b-256 | 4ceeb60d6034fcd1d5e33d8388d563709bd6f9e8e03f3a92061260a41f2e0476 |