Skip to main content

Structured data format with modern metadata system, Dublin Core and JSON-LD support

Project description

sdata_core

Structured data format with modern metadata system, Dublin Core and JSON-LD support.

Features

  • Type-safe Metadata: Dataclass-based Attribute and Metadata classes
  • Dublin Core Support: Built-in vocabulary mapping for scientific data
  • JSON-LD Export: Semantic web compatible output
  • DataFrame Integration: Seamless pandas DataFrame conversion
  • SUUID: Semantic UUIDs for reproducible identification
  • JSON Schema: Auto-generated validation schemas

Installation

pip install sdata_core

Or with uv:

uv add sdata_core

Quick Start

Basic Metadata

from sdata_core import Attribute, Metadata, DType

# Create attributes
attr = Attribute(
    name="temperature",
    value=293.15,
    dtype=DType.FLOAT,
    unit="K",
    description="Sample temperature"
)

# Create metadata container
meta = Metadata(name="Experiment 001")
meta.set_attr("force", 5000.0, unit="N", dtype=DType.FLOAT)
meta.set_attr("material", "DP800 Steel")
meta.set_attr("valid", True, dtype=DType.BOOL)

# Access attributes
print(meta["force"].value)  # 5000.0
print(meta.keys())  # ['force', 'material', 'valid']

Serialization

# JSON export (NaN-safe)
json_str = meta.to_json()

# DataFrame export
df = meta.to_dataframe()
print(df)
#              name        value unit   dtype
# key
# force        force       5000.0    N   float
# material  material  DP800 Steel    -     str
# valid        valid         True    -    bool

# Round-trip
meta2 = Metadata.from_json(json_str)
meta3 = Metadata.from_dataframe(df)

Dublin Core Integration

from sdata_core import Metadata, DublinCore, add_dc_attribute

meta = Metadata(name="Research Dataset")

# Add Dublin Core metadata
add_dc_attribute(meta, "title", "Tensile Test Results")
add_dc_attribute(meta, "creator", "Dr. Jane Smith")
add_dc_attribute(meta, "identifier", "doi:10.1234/example")

# Get Dublin Core representation
dc_dict = DublinCore.to_dc_dict(meta)
print(dc_dict)
# {'dc:title': 'Tensile Test Results', 'dc:creator': 'Dr. Jane Smith', ...}

JSON-LD Export

import json

jsonld = meta.to_jsonld()
print(json.dumps(jsonld, indent=2))
# {
#   "@context": {
#     "@vocab": "https://schema.org/",
#     "dc": "http://purl.org/dc/elements/1.1/",
#     ...
#   },
#   "@type": "sdata_core:Metadata",
#   ...
# }

Type-Annotated Fields

from typing import Annotated
from sdata_core import FieldMeta, create_attribute_from_annotated

# Define typed field with metadata
Temperature = Annotated[float, FieldMeta(
    unit="K",
    description="Temperature measurement",
    ontology="http://purl.obolibrary.org/obo/PATO_0000146"
)]

# Create attribute from annotated type
attr = create_attribute_from_annotated("sample_temp", 293.15, Temperature)
print(attr.unit)      # "K"
print(attr.ontology)  # "http://purl.obolibrary.org/obo/PATO_0000146"

Semantic UUIDs (SUUID)

from sdata_core import SUUID

# Create deterministic SUUID from name
sid = SUUID.from_name(class_name="Experiment", name="Test 001")
print(sid.sname)  # "Experiment__test_001__<uuid>"
print(sid.did)    # "did:sdata_core-suuid:Experiment__test_001__<uuid>"

# Random SUUID
sid2 = SUUID(class_name="Data", name="sample")
print(sid2.huuid)  # Random 32-char hex string

JSON Schema Generation

schema = Metadata.get_schema()
print(schema["title"])  # "sdata_core Metadata Schema"

# Validate with jsonschema library
import jsonschema
data = meta.to_dict()
jsonschema.validate(instance=data, schema=schema)

Supported Data Types

DType Python Type Description
DType.FLOAT float Floating point numbers
DType.INT int Integers
DType.STR str Strings
DType.BOOL bool Booleans
DType.TIMESTAMP datetime ISO 8601 timestamps
DType.LIST list[str] List of strings

Export Formats

  • JSON: to_json() / from_json()
  • DataFrame: to_dataframe() / from_dataframe()
  • CSV: to_csv() / from_csv()
  • JSON-LD: to_jsonld() / from_jsonld()
  • Dict: to_dict() / from_dict()

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sdata_core-0.1.8-cp312-cp312-manylinux_2_35_x86_64.whl (728.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.35+ x86-64

File details

Details for the file sdata_core-0.1.8-cp312-cp312-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for sdata_core-0.1.8-cp312-cp312-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 f0bd27eafc6d0ee511c8375514f7f77142e753b94e9fa83403743413ef0c0871
MD5 ba6bde11b993338928fe48b689428188
BLAKE2b-256 a56b1fbf4ca3d520417d13043592eebaddafa9495e2e7b250bf9ea25cc636e47

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page