Folder-based backend for eTiKeT sync agent
Project description
eTiKeT Sync Agent - FolderBase Backend
Backend for synchronizing folder-based datasets with the eTiKeT platform. This backend scans directories for datasets marked with a _QH_dataset_info.yaml file and syncs their contents to the cloud.
How It Works
The FolderBase backend continuously watches a specified folder, automatically detects new and existing datasets, and uploads them to QHarbor. Note that it synchronizes to the server, not from the server.
A folder is recognized as a dataset when it contains a _QH_dataset_info.yaml file. This file specifies the minimum amount of information needed to create a dataset. Every other file in the folder (and subdirectories) is considered a data file and will be added to the dataset.
Example Folder Structure
main_folder/
├── 20240101/
│ ├── 20240101-211245-165-731d85-experiment_1/
│ │ ├── _QH_dataset_info.yaml
│ │ ├── 01-01-2024_01-01-01.json
│ │ └── 01-01-2024_01-01-01.hdf5
├── 20240102/
│ ├── 20240102-220655-268-455d85-experiment_2/
│ │ ├── _QH_dataset_info.yaml
│ │ ├── 02-01-2024_02-02-02.json
│ │ ├── 02-01-2024_02-02-02.hdf5
│ │ └── analysis/
│ │ ├── 02-01-2024_02-02-02_analysis.json
│ │ └── 02-01-2024_02-02-02_analysis.hdf5
└── some_other_folder/
├── _QH_dataset_info.yaml
└── 01-01-2024_01-01-01.json
If a file is added to any of these folders or a new dataset folder is created, the sync agent will automatically detect and upload it.
Installation
pip install etiket_sync_agent_folderbase
The package is automatically discovered by etiket_sync_agent through the entry-point system.
Configuration
The FolderBase backend requires a FolderBaseConfigData configuration:
| Field | Type | Required | Description |
|---|---|---|---|
root_directory |
Path or str |
Yes | Root directory to watch for datasets. Supports ~ expansion. |
is_server_folder |
bool |
Yes | Whether this is a network/server folder (e.g., on a university network drive) |
Please use our flutter GUI or the etiket_sdk to add this sync source.
The _QH_dataset_info.yaml File
When performing measurements, we recommend programmatically creating the _QH_dataset_info.yaml file in the dataset folder.
Minimal Example
version: 0.1
Full Field Reference
| Field | Required | Type | Description |
|---|---|---|---|
version |
Yes | str |
File format version (currently 0.1) |
dataset_name |
No | str |
Name of the dataset. Default: folder name |
created |
No | str |
Creation date in format YYYY-MM-DDTHH:MM:SS. Default: earliest file modification time |
collected |
No | str |
Collection date (alternative to created) |
description |
No | str |
Description of the dataset |
attributes |
No | dict |
Key-value pairs (values must be str or number) |
tags |
No | list |
Tags for the dataset |
skip |
No | list |
Glob patterns for files/folders to exclude (e.g., ["*.json", "raw_data/*"]) |
converters |
No | dict |
File converters to apply (see below) |
Complete Example
version: 0.1
dataset_name: 'my_dataset_name'
description: "Description of the experiment I want to do."
attributes:
initials: 'QH'
set_up: 'XLD001'
sample: 'my_sample'
tags: ['rabi', 'test']
skip: ['*.json', 'raw_data/*']
converters:
csv_to_hdf5_converter:
module: etiket_sync_agent_qh_converters
class: CSVToHDF5Converter
⚠️ Note: The YAML file must use spaces for indentation, not tabs. Using tabs will cause parsing errors and synchronization will fail.
File Converters
You can specify converters to automatically transform files during sync. The naming convention is {input}_to_{output}_converter.
Converter Syntax
converters:
txt_to_csv_converter:
module: my_library.location.to.module
class: MyConverterClass
Available Converters
The etiket_sync_agent_qh_converters package provides built-in converters:
zarr→ HDF5- CSV → HDF5
- And more...
To create custom converters, implement a class that inherits from FileConverter and provides the convert method. The converter can be installed with the etiket_sdk package. For more information on creating converters, see the etiket_sync_agent package documentation.
Programmatic Dataset Creation
You can programmatically create the _QH_dataset_info.yaml file using the generate_dataset_info function:
from datetime import datetime
from etiket_sync_agent_folderbase import generate_dataset_info
from etiket_sync_agent_qh_converters import CSVToHDF5Converter
path = "my_path/test/"
generate_dataset_info(
path,
dataset_name="my_dataset_name",
creation=datetime.now(),
description="Description of the experiment I want to do.",
attributes={"sample": "my_sample"},
tags=["rabi", "test"],
converters=[CSVToHDF5Converter],
skip=["*.json", "raw_data/*"]
)
Note: This function is also re-exported by the
qdrivepackage asqdrive.dataset.generate_dataset_info.
See dataset_info.py for the full function signature and documentation.
What Gets Synchronized
| Source | eTiKeT Field | Description |
|---|---|---|
dataset_name or folder name |
name |
Name of the dataset |
description |
description |
Dataset description (appended with source path) |
created/collected or earliest file mtime |
collected |
Dataset creation time |
tags |
tags |
Searchable tags |
attributes |
attributes |
Key-value metadata |
| All files (except skipped) | Data files | Uploaded with detected file type |
Supported File Types
Any file type is supported. For zarr files (which are actually folders), use a converter from etiket_sync_agent_qh_converters to convert them to HDF5.
Features
- Directory-based dataset discovery: Automatically finds datasets by
_QH_dataset_info.yamlpresence - YAML-based configuration: Simple declarative dataset metadata
- File converter support: Transform files during sync (e.g., zarr → HDF5, CSV → HDF5)
- Skip patterns: Exclude files/folders using glob patterns
- Automatic file type detection: Detects JSON, text, HDF5/NetCDF files
- Subdirectory support: Syncs all files recursively within dataset folders
Requirements
- Python >= 3.10
- xarray
- h5netcdf
- PyYAML
License
Copyright © 2025 QHarbor. All Rights Reserved. See LICENCE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file etiket_sync_agent_folderbase-0.3.0b1-py3-none-any.whl.
File metadata
- Download URL: etiket_sync_agent_folderbase-0.3.0b1-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fcf2de2614fdf544504441af97df7470461b9e818d00ec10eabe3578a00252e
|
|
| MD5 |
d702f95457479ac51e7e039ff19457d1
|
|
| BLAKE2b-256 |
b9e5045f50946253aab28abc5e740ca38c569d79f6566733a0d50e5ff5a7648e
|