Skip to main content

A Data Science Infrastructure Metadatabase

Project description

DSI

The goal of the Data Science Infrastructure Project (DSI) is to provide a flexible, AI-ready metadata query capability which returns data subject to strict, POSIX-enforced file security. The data lifecycle for AI/ML requires seamless transitions from data-intensive/AI/ML research activity to long-term archiving and shared data repositories. DSI enables flexible, data-intensive scientific workflows that meet researcher needs.

DSI is implemented in three parts:

  • Plugins (Readers and Writers)

  • Backends

  • Core middleware

Plugins curate metadata for query and data return. Plugins can have read or write funcitonality acting as Readers and Writers for DSI. Plugins acting as readers harvest data from files and streams. Plugins acting as writers execute containerized or baremetal applications to supplement queriable metadata and data. Plugins may be user contributed and a default set of plugins is available with usage examples in our Core documentation.

Backends are interfaces for the Core middleware. Backends consist mostly of back-end/storage functionalities and are the interface between the Core Middleware and a data store. Backends may also have some front-end functionality interfacing between a DSI user and the Core middleware. Backends may be user contributed and a default set of backends are available with usage examples in our Core documentation.

DSI Core middleware provides the user/machine interface. The Core middleware defines a Terminal object. An instantiated Core Terminal can load zero or more plugins and backends. A Terminal object can be used in scripting workflows and program loops.

DSI Core Requirements

  • python3 (3.11 tested)

  • Linux OS (RHEL- and Debian-based distributions tested)

  • Git

  • Plugins and Backends introduce further requirements

Getting Started

DSI does not yet have a versioned release and should be considered pre-alpha. Project contributors are encouraged to prototype solutions which do not contain sensitive data at this time. Consequently a PyPA release is planned but incomplete. It is possible to install DSI locally instead.

We recommend Miniconda3 for managing virtual environments for DSI:

. ~/miniconda3/bin/activate
conda create -n dsi python=3.11
conda activate dsi

Python virtual environments can also be used for DSI:

python3 -m venv dsienv
source dsienv/bin/activate
pip install --upgrade pip

After activating your environment:

git clone https://github.com/lanl/dsi.git
cd dsi/
python3 -m pip install .

Project details


Release history Release notifications | RSS feed

This version

0.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydsinf-0.5.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydsinf-0.5-py3-none-any.whl (22.5 kB view details)

Uploaded Python 3

File details

Details for the file pydsinf-0.5.tar.gz.

File metadata

  • Download URL: pydsinf-0.5.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.11

File hashes

Hashes for pydsinf-0.5.tar.gz
Algorithm Hash digest
SHA256 97d61c5294b4f7f2ae90e45bcbb2c1aedf1edf7c543b7ddcea7de425de62979c
MD5 5375bac5ce83de4d063dcda6e386acd8
BLAKE2b-256 50dcdfb940926251c4b1f9e037ad91c82472ea00a6d5ddb8d3fc761be2d054cf

See more details on using hashes here.

File details

Details for the file pydsinf-0.5-py3-none-any.whl.

File metadata

  • Download URL: pydsinf-0.5-py3-none-any.whl
  • Upload date:
  • Size: 22.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.11

File hashes

Hashes for pydsinf-0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0c75b83504d7956c13c68f700d7e78f659c761b8eb7a37188fb74d40cbc2e0e3
MD5 3c44e68d483a198b164dcbcbdbb55956
BLAKE2b-256 34e3a44db875440e1af748ab70f2f891dadc6c4b3ca4f2ac578330cb596afe4a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page