A Data Science Infrastructure Metadatabase
Project description
DSI
The goal of the Data Science Infrastructure Project (DSI) is to provide a flexible, AI-ready metadata query capability which returns data subject to strict, POSIX-enforced file security. The data lifecycle for AI/ML requires seamless transitions from data-intensive/AI/ML research activity to long-term archiving and shared data repositories. DSI enables flexible, data-intensive scientific workflows that meet researcher needs.
DSI is implemented in three parts:
Plugins (Readers and Writers)
Backends
Core middleware
Plugins curate metadata for query and data return. Plugins can have read or write funcitonality acting as Readers and Writers for DSI. Plugins acting as readers harvest data from files and streams. Plugins acting as writers execute containerized or baremetal applications to supplement queriable metadata and data. Plugins may be user contributed and a default set of plugins is available with usage examples in our Core documentation.
Backends are interfaces for the Core middleware. Backends consist mostly of back-end/storage functionalities and are the interface between the Core Middleware and a data store. Backends may also have some front-end functionality interfacing between a DSI user and the Core middleware. Backends may be user contributed and a default set of backends are available with usage examples in our Core documentation.
DSI Core middleware provides the user/machine interface. The Core middleware defines a Terminal object. An instantiated Core Terminal can load zero or more plugins and backends. A Terminal object can be used in scripting workflows and program loops.
DSI Core Requirements
python3 (3.11 tested)
Linux OS (RHEL- and Debian-based distributions tested)
Git
Plugins and Backends introduce further requirements
Getting Started
DSI does not yet have a versioned release and should be considered pre-alpha. Project contributors are encouraged to prototype solutions which do not contain sensitive data at this time. Consequently a PyPA release is planned but incomplete. It is possible to install DSI locally instead.
We recommend Miniconda3 for managing virtual environments for DSI:
. ~/miniconda3/bin/activate conda create -n dsi python=3.11 conda activate dsi
Python virtual environments can also be used for DSI:
python3 -m venv dsienv source dsienv/bin/activate pip install --upgrade pip
After activating your environment:
git clone https://github.com/lanl/dsi.git cd dsi/ python3 -m pip install .
Copyright and License
This program is open source under the BSD-3 License.
© 2023. Triad National Security, LLC. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1.Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2.Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
3.Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydsinf-0.5.tar.gz.
File metadata
- Download URL: pydsinf-0.5.tar.gz
- Upload date:
- Size: 20.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97d61c5294b4f7f2ae90e45bcbb2c1aedf1edf7c543b7ddcea7de425de62979c
|
|
| MD5 |
5375bac5ce83de4d063dcda6e386acd8
|
|
| BLAKE2b-256 |
50dcdfb940926251c4b1f9e037ad91c82472ea00a6d5ddb8d3fc761be2d054cf
|
File details
Details for the file pydsinf-0.5-py3-none-any.whl.
File metadata
- Download URL: pydsinf-0.5-py3-none-any.whl
- Upload date:
- Size: 22.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c75b83504d7956c13c68f700d7e78f659c761b8eb7a37188fb74d40cbc2e0e3
|
|
| MD5 |
3c44e68d483a198b164dcbcbdbb55956
|
|
| BLAKE2b-256 |
34e3a44db875440e1af748ab70f2f891dadc6c4b3ca4f2ac578330cb596afe4a
|