Skip to main content

Infer schemas from JSON, YAML, XML, TOML, CBOR, and BSON

Project description

schema_analysis

Universal-ish Schema Analysis

Ever wished you could figure out what was in that json file? Or maybe it was xml... Ehr, yaml? It was definitely toml.

Alas, many great tools will only work with one of those formats, and the internet is not so nice a place as to finally understand that no, xml is not an acceptable data format.

Enter this neat little tool, a single interface to any self-describing format supported by our gymnast friend, serde.

Features

  • Works with any self-describing format with a Serde implementation.
  • Suitable for large files.
  • Keeps track of some useful info for each type (opt out with --minimal).
  • Keeps track of null/missing/duplicate values separately.
  • Integrates with Schemars and json_typegen to produce types and a json schema if needed.
  • There's a demo website here.

Installation

# Run without installing
npx schema_analysis data.json
# or
uvx schema_analysis data.json
# or
pipx run schema_analysis data.json

# Install
npm install -g schema_analysis
# or
pip install schema_analysis
# or
uv tool install schema_analysis
# or
cargo install schema_analysis --features cli --locked

CLI Usage

schema_analysis can infer schemas and generate types from data directly from the command line.

schema_analysis [OPTIONS] [FILES]...

It auto-detects the input format from file extensions (.json, .yaml/.yml, .xml, .toml, .cbor, .bson) and reads from stdin if no files are provided.

Options:

Option Description Default
--format <FORMAT> Override input format (json, yaml, xml, toml, cbor, bson) auto-detected
--output <OUTPUT> Output mode (schema, rust, typescript, typescript-alias, kotlin, kotlin-kotlinx, json-schema, shape) schema
--name <NAME> Root type name for code generation Root
--compact Compact JSON output (no pretty printing)
--minimal Skip analysis info (counts, samples, min/max, etc.), outputting only the schema structure

Examples:

# Infer a schema from a JSON file
schema_analysis data.json

# Generate Rust types
schema_analysis data.json --output rust --name MyData

# Generate TypeScript interfaces
schema_analysis api.json --output typescript --name ApiResponse

# Generate JSON Schema
schema_analysis data.json --output json-schema

# Merge multiple files into a single schema
schema_analysis file1.json file2.json file3.json

# Read from stdin
cat data.json | schema_analysis --format json

Library Usage

For use as a library, see the Rust crate or the repo.

Performance

These are not proper benchmarks, but should give a vague idea of the performance on a i7-7700HQ laptop (2017) laptop with the raw data already loaded into memory.

Size wasm (MB/s) native (MB/s) Format File #
~180MB ~20s (9) ~5s (36) json 1
~650MB ~150s (4.3) ~50s (13) json 1
~1.7GB ~470s (3.6) ~145s (11.7) json 1
~2.1GB a ~182s (11.5) json 1
~13.3GBb ~810s (16.4) xml ~200k

a This one seems to go over some kind of browser limit when fetching the data in the Web Worker, I believe I would have to split large files to handle it.

b ~2.7GB compressed. This one seems like it would be a worst-case scenario because it includes decompression overhead and the files had a section that was formatted text which resulted in crazy schemas. (The json pretty printed schema was almost 0.5GB!)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schema_analysis-0.7.0.tar.gz (17.0 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

schema_analysis-0.7.0-py3-none-win_amd64.whl (1.9 MB view details)

Uploaded Python 3Windows x86-64

schema_analysis-0.7.0-py3-none-manylinux_2_34_x86_64.whl (2.0 MB view details)

Uploaded Python 3manylinux: glibc 2.34+ x86-64

schema_analysis-0.7.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.9 MB view details)

Uploaded Python 3manylinux: glibc 2.17+ ARM64

schema_analysis-0.7.0-py3-none-macosx_11_0_arm64.whl (1.8 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

schema_analysis-0.7.0-py3-none-macosx_10_12_x86_64.whl (1.9 MB view details)

Uploaded Python 3macOS 10.12+ x86-64

File details

Details for the file schema_analysis-0.7.0.tar.gz.

File metadata

  • Download URL: schema_analysis-0.7.0.tar.gz
  • Upload date:
  • Size: 17.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for schema_analysis-0.7.0.tar.gz
Algorithm Hash digest
SHA256 8e2c226fb4d1c37b2b983f4edcb7d7f5fe3eefc564a3227d69254e22e2ebae6a
MD5 5a8b41e2ca8679bfdf5d33f381060827
BLAKE2b-256 ced6a2e95dd960789d8cffd3cdf17d76d3d5a36e3598d0cc1d1103e3b47326c7

See more details on using hashes here.

File details

Details for the file schema_analysis-0.7.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: schema_analysis-0.7.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for schema_analysis-0.7.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 0889fa560c177d521030f71a70532f1cd1cc789761025e8d3484014288c7b10e
MD5 1e77e1e1d5b0dbc60b1872d8d072edfa
BLAKE2b-256 93f2208e75334d18e97656e55fdddf5e3d29514de0b49b898fe8ab1e18f2ccc5

See more details on using hashes here.

File details

Details for the file schema_analysis-0.7.0-py3-none-manylinux_2_34_x86_64.whl.

File metadata

  • Download URL: schema_analysis-0.7.0-py3-none-manylinux_2_34_x86_64.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3, manylinux: glibc 2.34+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for schema_analysis-0.7.0-py3-none-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 c362edfa3834863368a7c7d2c0fd2b92340c5f9c7ad7b7efa63e7e5eba0dca0a
MD5 8e4f709445e30ab18f291d15c657d32c
BLAKE2b-256 479c95299d067749654414401408267f61a8e4e454a63b7097893bbac5b7683c

See more details on using hashes here.

File details

Details for the file schema_analysis-0.7.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

  • Download URL: schema_analysis-0.7.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3, manylinux: glibc 2.17+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for schema_analysis-0.7.0-py3-none-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 36c5b5a12245acec2dff9eab07fe973c9e05b1497aedfaa7bf592dbaa2fccaea
MD5 0910effb72be83bf8c85c58d7f923790
BLAKE2b-256 c8c6f52221794ce6c2031e080c63abf414fa821ed19e7a26924e54e8245d9e57

See more details on using hashes here.

File details

Details for the file schema_analysis-0.7.0-py3-none-macosx_11_0_arm64.whl.

File metadata

  • Download URL: schema_analysis-0.7.0-py3-none-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: Python 3, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for schema_analysis-0.7.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dbc9efb5fbb2ff26413f83a68be863882f00ba6113796303b243ad0628c0037a
MD5 e4f675e5dd63d850e601c09f69ea78c4
BLAKE2b-256 d10ba41425383cf899479ad5059ca7eefa256f55357bc16df37298c4a9e08c62

See more details on using hashes here.

File details

Details for the file schema_analysis-0.7.0-py3-none-macosx_10_12_x86_64.whl.

File metadata

  • Download URL: schema_analysis-0.7.0-py3-none-macosx_10_12_x86_64.whl
  • Upload date:
  • Size: 1.9 MB
  • Tags: Python 3, macOS 10.12+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for schema_analysis-0.7.0-py3-none-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 62174d185a249f289ac6dba93c2fe54646c85cdb8b05000563505cf0fb5daf09
MD5 48990abeea37133f04ec4533f81964e5
BLAKE2b-256 e91eccba2af8f84d3512b16e0f2dbaeef970f5fcd5c4565b14e5034913cac14c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page