Skip to main content

Structured data format with modern metadata system, Dublin Core and JSON-LD support

Project description

sdata_core

Structured data format with modern metadata system, Dublin Core and JSON-LD support.

Features

  • Type-safe Metadata: Dataclass-based Attribute and Metadata classes
  • Dublin Core Support: Built-in vocabulary mapping for scientific data
  • JSON-LD Export: Semantic web compatible output
  • DataFrame Integration: Seamless pandas DataFrame conversion
  • SUUID: Semantic UUIDs for reproducible identification
  • JSON Schema: Auto-generated validation schemas

Installation

pip install sdata_core

Or with uv:

uv add sdata_core

Quick Start

Basic Metadata

from sdata_core import Attribute, Metadata, DType

# Create attributes
attr = Attribute(
    name="temperature",
    value=293.15,
    dtype=DType.FLOAT,
    unit="K",
    description="Sample temperature"
)

# Create metadata container
meta = Metadata(name="Experiment 001")
meta.set_attr("force", 5000.0, unit="N", dtype=DType.FLOAT)
meta.set_attr("material", "DP800 Steel")
meta.set_attr("valid", True, dtype=DType.BOOL)

# Access attributes
print(meta["force"].value)  # 5000.0
print(meta.keys())  # ['force', 'material', 'valid']

Serialization

# JSON export (NaN-safe)
json_str = meta.to_json()

# DataFrame export
df = meta.to_dataframe()
print(df)
#              name        value unit   dtype
# key
# force        force       5000.0    N   float
# material  material  DP800 Steel    -     str
# valid        valid         True    -    bool

# Round-trip
meta2 = Metadata.from_json(json_str)
meta3 = Metadata.from_dataframe(df)

Dublin Core Integration

from sdata_core import Metadata, DublinCore, add_dc_attribute

meta = Metadata(name="Research Dataset")

# Add Dublin Core metadata
add_dc_attribute(meta, "title", "Tensile Test Results")
add_dc_attribute(meta, "creator", "Dr. Jane Smith")
add_dc_attribute(meta, "identifier", "doi:10.1234/example")

# Get Dublin Core representation
dc_dict = DublinCore.to_dc_dict(meta)
print(dc_dict)
# {'dc:title': 'Tensile Test Results', 'dc:creator': 'Dr. Jane Smith', ...}

JSON-LD Export

import json

jsonld = meta.to_jsonld()
print(json.dumps(jsonld, indent=2))
# {
#   "@context": {
#     "@vocab": "https://schema.org/",
#     "dc": "http://purl.org/dc/elements/1.1/",
#     ...
#   },
#   "@type": "sdata_core:Metadata",
#   ...
# }

Type-Annotated Fields

from typing import Annotated
from sdata_core import FieldMeta, create_attribute_from_annotated

# Define typed field with metadata
Temperature = Annotated[float, FieldMeta(
    unit="K",
    description="Temperature measurement",
    ontology="http://purl.obolibrary.org/obo/PATO_0000146"
)]

# Create attribute from annotated type
attr = create_attribute_from_annotated("sample_temp", 293.15, Temperature)
print(attr.unit)      # "K"
print(attr.ontology)  # "http://purl.obolibrary.org/obo/PATO_0000146"

Semantic UUIDs (SUUID)

from sdata_core import SUUID

# Create deterministic SUUID from name
sid = SUUID.from_name(class_name="Experiment", name="Test 001")
print(sid.sname)  # "Experiment__test_001__<uuid>"
print(sid.did)    # "did:sdata_core-suuid:Experiment__test_001__<uuid>"

# Random SUUID
sid2 = SUUID(class_name="Data", name="sample")
print(sid2.huuid)  # Random 32-char hex string

JSON Schema Generation

schema = Metadata.get_schema()
print(schema["title"])  # "sdata_core Metadata Schema"

# Validate with jsonschema library
import jsonschema
data = meta.to_dict()
jsonschema.validate(instance=data, schema=schema)

Supported Data Types

DType Python Type Description
DType.FLOAT float Floating point numbers
DType.INT int Integers
DType.STR str Strings
DType.BOOL bool Booleans
DType.TIMESTAMP datetime ISO 8601 timestamps
DType.LIST list[str] List of strings

Export Formats

  • JSON: to_json() / from_json()
  • DataFrame: to_dataframe() / from_dataframe()
  • CSV: to_csv() / from_csv()
  • JSON-LD: to_jsonld() / from_jsonld()
  • Dict: to_dict() / from_dict()

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdata_core-0.1.7-cp312-cp312-manylinux_2_35_x86_64.whl (727.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.35+ x86-64

File details

Details for the file sdata_core-0.1.7-cp312-cp312-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for sdata_core-0.1.7-cp312-cp312-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 84879809aa1f94f88ce424e387e3e8a879329c3043cb4f12235c2df269e352c7
MD5 fa0b7fe7b868faf77fd91adc9581ccf3
BLAKE2b-256 a3940c921126d04fb1661f7d54c53f7e3dec3076fb2df0592ebed6da365fe740

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page