SciDatS is a python package for storing and retrieving scientific data stored in JSON-LD (semantically annotated JSON - Linked Data).
Project description
SciDatS
SciDatS is a python package for storing and retrieving scientific data stored as parquet files with JSON-LD metadata (semantically annotated JSON - Linked Data).
This Scientific Data Standard is designed as a data exchange format to enable exchange/synchronisation of Scientific Data, maintaining all metadata between different laboratories.
Features
- efficient storage and retrieval of scientific data and metadata in a single file (parquet with JSON-LD metadata)
- convenient functions for retrieving data and metadata
- improved tooling based on pydantic and rdflib
- reading and writing for SciDatS files
- coupling to the LabDataReader framework - for transforming proppriatory lab data into a semantically annotated SciDatSa format.
- recommended metadata formats should be DCAT-application profiles, e.g. DCAT-AP-PLUS or Chem-DCAT-AP
Design criteria
Here are some of the criteria the data / metadata standard has to fulfil (and in brackets the selected technology) :
-
data and metadata storage for scientific / machine learning needs (semantic annotation, based on ontologies, derivatives of owlready2)
-
proper nullable data / missing data handling (pyarrow / parquet)
-
data modalities, like range / limits, type / continuous / categorial / variable treatment in case of range violation (parquet metadata)
-
cardinality (parquet metadata)
-
-
efficient storage (parquet)
-
metadata and data stored at one place (parquet)
-
metadata conservation when saving / loading / processing (parquet -> arrow)
-
fast data exchange (arrow flight, MinIO active replication)
-
fast loading (fastparquet, pyarrow)
-
fast data processing without in-memory re-writing after loading ( pandas with pyarrow backend, arrow flight, polars)
-
"modalities" for the machine learning models
-
semantic annotations / metadata in RDF compliant format - for creating instances of ontology classes and SPARQL reasoning (JSON-LD, rdflib, owlready2)
-
fast data processing (direct loading into pyarrow driven dataframe )
-
programming language agnostic / independent (parquet)
-
easy to use (SciDatS / labDataReader framework, currently in implementation by me)
-
commonly used in ETL pipelines (Apache Spark, prefect, ... )
-
suitable for S3 file storage systems (MinIO)
Installation
You can install SciDatS via pip (or your favourite package manager):
# using pip
pip install scidats
# using uv
uv add scidats
uv sync --group dev --group test
Tutorials
A tutorial on how to use SciDatS can be found in the scidat_demo_tutorial.ipynb Jupyter notebook in the jupyter folder.
Documentation
The Documentation can be found here: https://opensourcelab/scientificdata.gitlab.io/scidats
ReadTheDocs: https://scidats.readthedocs.io
Source Code: https://gitlab.com/opensourcelab/scientificdata/scidats
SciDatS is a python package for storing and retrieving scientific data stored in JSON-LD (semantically annotated JSON - Linked Data).
Contributors ✨
Thanks goes to these wonderful people (emoji key):
This project follows the all-contributors specification. Contributions of any kind welcome!
Credits
This package was created with Copier and the pypackage-template project template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scidats-0.0.17.tar.gz.
File metadata
- Download URL: scidats-0.0.17.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
179e3ef038efa653030385456d1b8da96c5dde3afbba975c6b9d7a28a6b3867f
|
|
| MD5 |
3f3884d53149267f9c191c00c67dae60
|
|
| BLAKE2b-256 |
f325c11f21dd6e874df25314d049a0cb5b3cc099af2dbaa90ce4a34efafcb16e
|
File details
Details for the file scidats-0.0.17-py3-none-any.whl.
File metadata
- Download URL: scidats-0.0.17-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75365970d221b74212b2a1e43ba719964ba5d947edf5a112f5529cf0204007a4
|
|
| MD5 |
0de31f6e5272f80a34e5a3b8dec21e00
|
|
| BLAKE2b-256 |
7981f65fb4707484b0742e9b3cf6be659efe3cef74b7b1f39082d461eb8ad2e7
|