Skip to main content

ETL system for interpreting laboratory instrument data files and loading them into a standardized format while enforcing schema and retaining all metadata.

Project description

lab-etl

PyPI - Version PyPI - Python Version

This repository contains the codebase for the ETL scripts for loading laboratory instrument data files into our database. In particular, data files from a variety of formats are converted to Apache Parquet files which provides a standardized interface for access and enforces schema. Of notable importance is the inclusion of metadata in these files. Metadata is extracted from the original test files and stored as JSON-like metadata within the Parquet files in either file-wide or column-specific, as appropriate. Depending on the type of file (from which type of instrument) the keys will be standardized for common fields. Additional metadata that may be instrument-specific will be stored as additional metadata but is not guaranteed to be standardized in any meaningful way. However, the names of these fields may be slightly altered to provide clarity to the user as to what they might represent.

Development currently focuses on files and instruments of interest to FSRI's Materials Properties Laboratory but as we integrate with external stakeholders, or have the time, additional instruments and filetypes will be added. Feel free to reach out if you have a particular need for some capability or submit a PR.


Table of Contents

Installation

pip install lab-etl

License

lab-etl is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labetl-0.0.2.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

labetl-0.0.2-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file labetl-0.0.2.tar.gz.

File metadata

  • Download URL: labetl-0.0.2.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for labetl-0.0.2.tar.gz
Algorithm Hash digest
SHA256 1b4b88d46933d0a269a506b6a871367e3742df99ee34a1e78b08e9d93d14cae7
MD5 9deee3390d58a5d928e77418cece4173
BLAKE2b-256 4fab9bc01b7f5c32df9785f6ae816f54402629a52b8938c6b8a43cdd68ae8377

See more details on using hashes here.

File details

Details for the file labetl-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: labetl-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 21.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.27.0

File hashes

Hashes for labetl-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b806dbb1158c03c426f97d5504ba31003c115f51b96ef0914ebfce1687a18169
MD5 9e06363a3d25a8e4a8ca66935046e2b0
BLAKE2b-256 79b01fc3a830efa76651650a1eea2ccc2b7a0f66235fb8c12dc03310aeedb185

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page