I/O schema loader + validator for analytics notebooks

These details have not been verified by PyPI

Project links

Project description

Analytic Schema

Analytic Schema is a lightweight Python package for loading, validating, and building standardized I/O documents for analytics notebooks based on a single, versioned JSON contract that drives both input parsing and output construction.

Description
Dependencies
Installation
Usage
Project structure
Background and Motivation
Contributing
Contributors
License

Description

Analytic Schema centralizes your notebook or script I/O definitions into one authoritative JSON file. From that contract it automatically:

Generates a complete command-line interface (via argparse) for every input field.
Parses parameters provided as CLI flags, JSON dictionaries, or JSON files.
Injects sensible defaults for all optional inputs.
Performs deep schema validation (type checks, enums, date-time formats, oneOf branches, and no extra fields).
Builds structured output documents with embedded metadata, SHA-256 hashes for auditability, and a built-in logging mechanism.

With zero dependencies beyond the standard library and pandas, Analytic Schema is ideal for air-gapped notebooks, CI pipelines, or any environment where you need a robust, self-contained I/O layer for cybersecurity analytics.

Dependencies

This project depends only on the Python standard library (>=3.8) and pandas (>=1.0).

Installation

pip install analytic-schema

In your code:

import analytic_schema

Usage

Below is a minimal end-to-end example showing how to go from raw inputs to a validated output file:

from analytic_schema import parse_input, validate_input, OutputDoc
import time
 # 1) Parse input parameters (CLI string, list, JSON, or file)
raw = parse_input(
    “—input-schema-version 1.0.0 “
    “—start-dtg 2025-06-01T00:00:00Z “
    “—end-dtg   2025-06-02T00:00:00Z “
    “—data-source-type file “
    “—data-source /tmp/log.csv”
)
 # 2) Validate against the JSON contract and fill defaults
params = validate_input(raw)
 # 3) Run your analytic logic…
start = time.perf_counter()

# ... your detection code here ...

findings = [
    {
        “finding_id”: “uuid-v4”,
        “title”: “Suspicious pattern”,
        “description”: “Detected anomalous traffic...”,
        “event_dtg”: “2025-06-07T12:34:56Z”,
        “severity”: “high”,
        “confidence”: “0.92”,
        “observables”: [“1.2.3.4”, “bad.example.com”],
        “mitre_attack_tactics”: [“TA0001”],
        “mitre_attack_techniques”: [“T1001”],
        “recommended_actions”: “Block IP and review logs”,
        “recommended_pivots”: “Check DNS logs”,
        “classification”: “U”
    }
]
elapsed_ms = (time.perf_counter() - start) * 1000
 # 4) Build the structured output document
out = OutputDoc(
    input_data_hash=params[“input_data_hash”],
    inputs=params,
    findings=findings,
    records_processed=len(findings)
)
out.add_message(“INFO”, “Analysis completed in %.2f ms” % elapsed_ms)
out.finalise()
 # 5) Serialize to JSON
out.save(“analysis_output.json”)

This example demonstrates how Analytic Schema handles all the I/O boilerplate—CLI parsing, default injection, validation, metadata, logging, hashing, and final serialization—so you can focus on the core analytic logic.

Project structure

analytic-schema/ # Project repository
├── analytic_schema/ # Package
│   ├── __init__.py
│   ├── loader.py
│   ├── parser.py
│   ├── validator.py
│   ├── output.py
│   └── analytic_schema.json
│
├── tests/
│   └── test_analytic_schema.py
│
├── example_usage.py
│
├── README.md      # This file
├── LICENSE.md     # Project license
├── Makefile       # Project makefile
└── pyproject.toml

Background and Motivation

In cybersecurity analytics, consistency and auditability are paramount. Analysts and automation pipelines often spin up dozens of scripts and notebooks, each rolling its own argument parsing, validation, and output formatting. This fragmentation leads to subtle bugs, schema drift, and integration headaches.

Analytic Schema addresses these challenges by elevating your I/O contract to one single JSON schema. This contract drives:

Uniformity: All analytics share the same field names, types, and defaults.
Reliability: Fail-fast validation prevents runtime surprises from missing or mistyped parameters.
Traceability: Inputs and findings are hashed, and logs are captured inline, enabling full audit trails.
Simplicity: With only the standard library plus pandas, it works in air-gapped environments and keeps your dependencies minimal.

By abstracting away boilerplate, you can focus on detecting and investigating threats, while ensuring your pipelines remain robust, maintainable, and easily integrated.

Contributing

Contributions are welcome from all, regardless of rank or position.

There are no system requirements for contributing to this project. To contribute via the web:

Click GitLab’s “Web IDE” button to open the online editor.
Make your changes. Note: limit your changes to one part of one file per commit; for example, edit only the “Description” section here in the first commit, then the “Background and Motivation” section in a separate commit.
Once finished, click the blue “Commit...” button.
Write a detailed description of the changes you made in the “Commit Message” box.
Select the “Create a new branch” radio button if you do not already have your own branch; otherwise, select your branch. The recommended naming convention for new branches is first.middle.last.
Click the green “Commit” button.

You may also contribute to this project using your local machine by cloning this repository to your workstation, creating a new branch, committing and pushing your changes, and creating a merge request.

Contributors

This section lists project contributors. When you submit a merge request, remember to append your name to the bottom of the list below. You may also include a brief list of the sections to which you contributed.

Creator: Zachary Szewczyk

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You can view the full text of the license in LICENSE.md. Read more about the license at the original author’s website. Generally speaking, this license allows individuals to remix this work provided they release their adaptation under the same license and cite this project as the original, and prevents anyone from turning this work or its derivatives into a commercial product.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.7

Aug 3, 2025

1.0.6

Jun 10, 2025

1.0.5

Jun 8, 2025

1.0.4

Jun 8, 2025

This version

1.0.3

Jun 8, 2025

1.0.2

Jun 8, 2025

1.0.1

Jun 8, 2025

1.0.0

Jun 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

analytic_schema-1.0.3.tar.gz (27.0 kB view details)

Uploaded Jun 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

analytic_schema-1.0.3-py3-none-any.whl (22.9 kB view details)

Uploaded Jun 8, 2025 Python 3

File details

Details for the file analytic_schema-1.0.3.tar.gz.

File metadata

Download URL: analytic_schema-1.0.3.tar.gz
Upload date: Jun 8, 2025
Size: 27.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for analytic_schema-1.0.3.tar.gz
Algorithm	Hash digest
SHA256	`aea11ba0d92ef43ca9edf0e81afe65a886e4a90bc6ce56197b69ae6139241fe8`
MD5	`6bbca85d371b2159e389bd1f722e619a`
BLAKE2b-256	`51b55a2d4a3972d1a4a741b6f912e47e42fbe745fd0eb3e4a8388cc1c297af76`

See more details on using hashes here.

File details

Details for the file analytic_schema-1.0.3-py3-none-any.whl.

File metadata

Download URL: analytic_schema-1.0.3-py3-none-any.whl
Upload date: Jun 8, 2025
Size: 22.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for analytic_schema-1.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a7a211e36f045c419509069846b31b598b0c0c4c16ed4f9f89c03c5cacae2371`
MD5	`95703bc6ca60b06eb0f18b48266a4dc7`
BLAKE2b-256	`99a33da3ed82e87547da4cbb0212146a77280517e69b68e10a57ee527dcf32be`

See more details on using hashes here.

analytic-schema 1.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Analytic Schema

Table of Contents

Description

Dependencies

Installation

Usage

Project structure

Background and Motivation

Contributing

Contributors

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes