A Python library for parsing and handling data exported by TA Instruments' TRIOS JSON Export
Project description
TA Data Kit (tadatakit)
TA Data Kit is a Python library developed by TA Instruments™, designed for easy parsing and handling of data exported by TRIOS™ JSON Export Feature.
Examples
If you would like to jump to some usage examples head over to our collection of Jupyter Notebooks showing everything from data reading, plotting, analysis, data conversion and more.
Installation
Prerequisites
Before installing tadatakit
, ensure that you have Python 3.9 or later installed on your system. You can download Python from the official Python website.
Installing via Pip
Open your terminal or command prompt and run the following command:
pip install tadatakit
Features
The tadatakit
library offers a robust suite of features designed to simplify and enhance the way you handle data from TRIOS JSON Export Feature.
-
Dynamic Class Generation: Automatically generates Python classes from the TRIOS JSON Schema. This ensures that the data models are always in sync with the latest schema definitions.
-
Pandas Integration: Seamlessly converts data into pandas DataFrames, making it easier to perform complex data analysis, visualization, and manipulation directly from experiment results.
-
Extensible Architecture: Designed with flexibility in mind, allowing users to easily extend or customize the generated classes to suit specific needs. Whether adding new methods, integrating with other libraries, or modifying property behaviors,
tadatakit
supports it all. -
Type-Safe Operations: Employs Python type hints throughout the dynamically generated classes, which enhances code quality and reliability through static type checking.
-
Serialization and Deserialization: Includes built-in methods for JSON serialization and deserialization, facilitating easy data storage and retrieval, and ensuring data integrity across different stages of data handling.
-
Schema-Driven Validation: Automatically validates data against the schema upon loading, ensuring that the data conforms to the expected structure and types defined by TA Instruments.
Quick Start
Classes
To utilize classes like Experiment
, import them directly from the tadatakit.classes
module. These classes are dynamically generated based on the data schema, with helper functions added.
Explore the Experiment
class in a REPL environment (iPython or Jupyter Notebook):
from tadatakit.classes import Experiment
Experiment?
File Parsing
Easily parse files using the from_json
method on the Experiment
class, as demonstrated below:
from tadatakit.classes import Experiment
experiment = Experiment.from_json("<path/to/json_file.json>")
As files can be large, be aware that this can take a large amount of memory.
Using The Data
Experiment
includes a convenience function to return the results data as a pandas DataFrame. The example below demonstrates parsing a file and utilizing the DataFrame:
from tadatakit.classes import Experiment
experiment = Experiment.from_json("<path/to/json_file.json>")
df = experiment.get_dataframe()
Utilizing and Extending Classes in TA Data Kit
The tadatakit
library offers a versatile framework for handling and manipulating data through dynamically generated classes. These classes can be used directly (1), extended with additional functionality (2), or fully customized (3).
1. Using Auto-Generated Classes
Classes such as Experiment
are dynamically created from a JSON schema by the class_generator
module. They come equipped with all necessary properties and methods for basic data handling:
from tadatakit.class_generator import Experiment
experiment = Experiment.from_json('experiment_data.json')
print(experiment.start_time)
2. Using Auto-Generated Classes Extended With Helper Functions
Use Experiment
imported from the classes
module to take advantage of helper functions like:
get_dataframe
: TransformsExperiment
results into a pandas DataFrame.get_dataframes_by_step
: Divides results into multiple DataFrames, one per procedure step.
Usage Example:
from tadatakit.classes import Experiment
experiment = Experiment.from_json('path_to_data.json')
df = experiment.get_dataframe()
print(df.head(5))
step, dfs = experiment.get_dataframes_by_step()
for step, df in zip(step, dfs):
print(step)
print(df.head(5))
3. Building Custom Extensions
Create custom functionality by adding new methods or altering existing behaviors, perhaps to add polars support, an analysis pipeline, or methods for injection into databases or LIMS systems:
Steps to Extend:
- Define New Functions: Craft functions that fulfill specific requirements.
- Attach to Classes: Dynamically bind these functions to the classes.
- Implement in Workflow: Integrate these enhanced objects into your application.
Custom Method Example:
from tadatakit.class_generator import Experiment
import datetime
def time_since_experiment(self):
return datetime.datetime.now() - self.start_time
setattr(Experiment, "time_since_experiment", time_since_experiment)
experiment = Experiment.from_json('data.json')
print(experiment.time_since_experiment())
Note: we provide no guarantee that your functions will not conflict with future additions to the schema. For example, if you add a dynamic property of
Experiment.end_time
it may conflict in the future with anEndTime
property in the schema.
Explanation Of Approach
The tadatakit.class_generator
module within the TA Data Kit automates the creation of Python classes directly from the TA Instruments TRIOS JSON Export Schema. This process allows for dynamic and efficient handling of data that conforms to defined standards, enhancing both development speed and data integrity. Here’s how the library achieves this:
Overview
The library converts a JSON schema provided in a specification file into fully functional Python classes. These classes include type hints, serialization methods, and custom behaviors, closely mirroring the structure and requirements laid out in the schema.
Steps for Class Generation
1. Schema Loading
The process begins with loading the JSON schema. This schema defines the structure, types, required fields, and additional validation rules for the data.
2. Schema Parsing
The loaded schema is parsed to identify different data structures and types. This includes simple types like strings and numbers, complex objects, arrays, and special formats like dates and UUIDs.
3. Class Creation
For each definition in the schema (representing a potential data model), a Python class is dynamically generated. The library maps JSON types to Python types (e.g., integer
to int
, object
to custom classes) and integrates any constraints and nested structures as class attributes.
4. Property Handling
Each class is equipped with properties based on the schema's definitions. Properties are added dynamically with appropriate getters, setters, and deletions to manage data access and ensure type safety. In some places, for example results data, the schema allows for additionalProperties
which are treated as kwargs
in Python.
5. Method Integration
Serialization and deserialization methods such as from_json
, to_json
, from_dict
, and to_dict
are integrated into each class. These methods handle conversion between JSON strings, dictionaries, and class instances, facilitating easy data exchange and storage operations.
6. Inheritance and Composition
If the schema specifies inheritance (using allOf
) or composition (using anyOf
or oneOf
), the library constructs classes that inherit from multiple bases or handle multiple data types, ensuring that the generated classes faithfully represent the schema's intended data structures.
7. Registration and Accessibility
Generated classes are registered in a global class registry within the library. This registry allows for easy retrieval and instantiation of classes based on schema names, supporting dynamic access and manipulation of data in a type-safe manner.
Contributing
We welcome contributions from the community and are pleased to have you join us in improving tadatakit
. Whether you are fixing bugs, adding new features, improving documentation, or suggesting new functionality, your input is valuable!
If you are interested in contributing to the tadatakit
library, please read our contributing guidelines for detailed information on how to get started, coding conventions, and the pull request process.
Notes
TA Instruments, TA, and TRIOS are trademarks of Waters Technologies Corporation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tadatakit-0.1.1.tar.gz
.
File metadata
- Download URL: tadatakit-0.1.1.tar.gz
- Upload date:
- Size: 24.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.8.10 Linux/5.15.153.1-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e6c5033c313571153b1d013c7ac5eea0b8312fa39011f5bab532cc350049697 |
|
MD5 | 2ba71df111ca00c88e937c36f385fb86 |
|
BLAKE2b-256 | d78c7ae540af4795e422fa158e1e4868a34b875c4ad5e28e57d8834e61a32736 |
File details
Details for the file tadatakit-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: tadatakit-0.1.1-py3-none-any.whl
- Upload date:
- Size: 23.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.8.10 Linux/5.15.153.1-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8258878430cf21c32e4d69a25743db245cf322b4d8d726068ff97149e282228a |
|
MD5 | c251843b538d24b01b85bc146ea36539 |
|
BLAKE2b-256 | 44c0c47b3aa7f7ae893c949360873309bd1586da78177a8c76d504c98061252d |