ETL system for interpreting laboratory instrument data files and loading them into a standardized format while enforcing schema and retaining all metadata.
Project description
lab-etl
This repository contains the codebase for the ETL scripts for loading laboratory instrument data files into our database. In particular, data files from a variety of formats are converted to Apache Parquet files which provides a standardized interface for access and enforces schema. Of notable importance is the inclusion of metadata in these files. Metadata is extracted from the original test files and stored as JSON-like metadata within the Parquet files in either file-wide or column-specific, as appropriate. Depending on the type of file (from which type of instrument) the keys will be standardized for common fields. Additional metadata that may be instrument-specific will be stored as additional metadata but is not guaranteed to be standardized in any meaningful way. However, the names of these fields may be slightly altered to provide clarity to the user as to what they might represent.
Development currently focuses on files and instruments of interest to FSRI's Materials Properties Laboratory but as we integrate with external stakeholders, or have the time, additional instruments and filetypes will be added. Feel free to reach out if you have a particular need for some capability or submit a PR.
Table of Contents
Installation
pip install lab-etl
License
lab-etl
is distributed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file labetl-0.0.2.tar.gz
.
File metadata
- Download URL: labetl-0.0.2.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b4b88d46933d0a269a506b6a871367e3742df99ee34a1e78b08e9d93d14cae7 |
|
MD5 | 9deee3390d58a5d928e77418cece4173 |
|
BLAKE2b-256 | 4fab9bc01b7f5c32df9785f6ae816f54402629a52b8938c6b8a43cdd68ae8377 |
File details
Details for the file labetl-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: labetl-0.0.2-py3-none-any.whl
- Upload date:
- Size: 21.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.27.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b806dbb1158c03c426f97d5504ba31003c115f51b96ef0914ebfce1687a18169 |
|
MD5 | 9e06363a3d25a8e4a8ca66935046e2b0 |
|
BLAKE2b-256 | 79b01fc3a830efa76651650a1eea2ccc2b7a0f66235fb8c12dc03310aeedb185 |