Skip to main content

SciDatS is a python package for storing and retrieving scientific data stored in JSON-LD (semantically annotated JSON - Linked Data).

Project description

SciDatS

CI Status Documentation Status Test coverage percentage

uv Ruff pre-commit

PyPI Version Supported Python versions License


SciDatS is a python package for storing and retrieving scientific data stored as parquet files with JSON-LD metadata (semantically annotated JSON - Linked Data).

This Scientific Data Standard is designed as a data exchange format to enable exchange/synchronisation of Scientific Data, maintaining all metadata between different laboratories.

Features

  • efficient storage and retrieval of scientific data and metadata in a single file (parquet with JSON-LD metadata)
  • convenient functions for retrieving data and metadata
  • improved tooling based on pydantic and rdflib
  • reading and writing for SciDatS files
  • coupling to the LabDataReader framework - for transforming proppriatory lab data into a semantically annotated SciDatSa format.
  • recommended metadata formats should be DCAT-application profiles, e.g. DCAT-AP-PLUS or Chem-DCAT-AP

Design criteria

Here are some of the criteria the data / metadata standard has to fulfil (and in brackets the selected technology) :

  • data and metadata storage for scientific / machine learning needs (semantic annotation, based on ontologies, derivatives of owlready2)

    • proper nullable data / missing data handling (pyarrow / parquet)

    • data modalities, like range / limits, type / continuous / categorial / variable treatment in case of range violation (parquet metadata)

    • cardinality (parquet metadata)

  • efficient storage (parquet)

  • metadata and data stored at one place (parquet)

  • metadata conservation when saving / loading / processing (parquet -> arrow)

  • fast data exchange (arrow flight, MinIO active replication)

  • fast loading (fastparquet, pyarrow)

  • fast data processing without in-memory re-writing after loading ( pandas with pyarrow backend, arrow flight, polars)

  • "modalities" for the machine learning models

  • semantic annotations / metadata in RDF compliant format - for creating instances of ontology classes and SPARQL reasoning (JSON-LD, rdflib, owlready2)

  • fast data processing (direct loading into pyarrow driven dataframe )

  • programming language agnostic / independent (parquet)

  • easy to use (SciDatS / labDataReader framework, currently in implementation by me)

  • commonly used in ETL pipelines (Apache Spark, prefect, ... )

  • suitable for S3 file storage systems (MinIO)

Installation

You can install SciDatS via pip (or your favourite package manager):

# using pip
pip install scidats
# using uv
uv add scidats
uv sync --group dev --group test

Tutorials

A tutorial on how to use SciDatS can be found in the scidat_demo_tutorial.ipynb Jupyter notebook in the jupyter folder.

Documentation

The Documentation can be found here: https://opensourcelab/scientificdata.gitlab.io/scidats

ReadTheDocs: https://scidats.readthedocs.io

Source Code: https://gitlab.com/opensourcelab/scientificdata/scidats


SciDatS is a python package for storing and retrieving scientific data stored in JSON-LD (semantically annotated JSON - Linked Data).

Contributors ✨

Thanks goes to these wonderful people (emoji key):

This project follows the all-contributors specification. Contributions of any kind welcome!

Credits

Copier

This package was created with Copier and the pypackage-template project template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scidats-0.0.17.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scidats-0.0.17-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file scidats-0.0.17.tar.gz.

File metadata

  • Download URL: scidats-0.0.17.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for scidats-0.0.17.tar.gz
Algorithm Hash digest
SHA256 179e3ef038efa653030385456d1b8da96c5dde3afbba975c6b9d7a28a6b3867f
MD5 3f3884d53149267f9c191c00c67dae60
BLAKE2b-256 f325c11f21dd6e874df25314d049a0cb5b3cc099af2dbaa90ce4a34efafcb16e

See more details on using hashes here.

File details

Details for the file scidats-0.0.17-py3-none-any.whl.

File metadata

  • Download URL: scidats-0.0.17-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for scidats-0.0.17-py3-none-any.whl
Algorithm Hash digest
SHA256 75365970d221b74212b2a1e43ba719964ba5d947edf5a112f5529cf0204007a4
MD5 0de31f6e5272f80a34e5a3b8dec21e00
BLAKE2b-256 7981f65fb4707484b0742e9b3cf6be659efe3cef74b7b1f39082d461eb8ad2e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page