Skip to main content

Folder-based backend for eTiKeT sync agent

Project description

eTiKeT Sync Agent - FolderBase Backend

Backend for synchronizing folder-based datasets with the eTiKeT platform. This backend scans directories for datasets marked with a _QH_dataset_info.yaml file and syncs their contents to the cloud.

How It Works

The FolderBase backend continuously watches a specified folder, automatically detects new and existing datasets, and uploads them to QHarbor. Note that it synchronizes to the server, not from the server.

A folder is recognized as a dataset when it contains a _QH_dataset_info.yaml file. This file specifies the minimum amount of information needed to create a dataset. Every other file in the folder (and subdirectories) is considered a data file and will be added to the dataset.

Example Folder Structure

main_folder/
├── 20240101/
│   ├── 20240101-211245-165-731d85-experiment_1/
│   │   ├── _QH_dataset_info.yaml
│   │   ├── 01-01-2024_01-01-01.json
│   │   └── 01-01-2024_01-01-01.hdf5
├── 20240102/
│   ├── 20240102-220655-268-455d85-experiment_2/
│   │   ├── _QH_dataset_info.yaml
│   │   ├── 02-01-2024_02-02-02.json
│   │   ├── 02-01-2024_02-02-02.hdf5
│   │   └── analysis/
│   │       ├── 02-01-2024_02-02-02_analysis.json
│   │       └── 02-01-2024_02-02-02_analysis.hdf5
└── some_other_folder/
    ├── _QH_dataset_info.yaml
    └── 01-01-2024_01-01-01.json

If a file is added to any of these folders or a new dataset folder is created, the sync agent will automatically detect and upload it.


Installation

pip install etiket_sync_agent_folderbase

The package is automatically discovered by etiket_sync_agent through the entry-point system.


Configuration

The FolderBase backend requires a FolderBaseConfigData configuration:

Field Type Required Description
root_directory Path or str Yes Root directory to watch for datasets. Supports ~ expansion.
is_server_folder bool Yes Whether this is a network/server folder (e.g., on a university network drive)

Please use our flutter GUI or the etiket_sdk to add this sync source.


The _QH_dataset_info.yaml File

When performing measurements, we recommend programmatically creating the _QH_dataset_info.yaml file in the dataset folder.

Minimal Example

version: 0.1

Full Field Reference

Field Required Type Description
version Yes str File format version (currently 0.1)
dataset_name No str Name of the dataset. Default: folder name
created No str Creation date in format YYYY-MM-DDTHH:MM:SS. Default: earliest file modification time
collected No str Collection date (alternative to created)
description No str Description of the dataset
attributes No dict Key-value pairs (values must be str or number)
tags No list Tags for the dataset
skip No list Glob patterns for files/folders to exclude (e.g., ["*.json", "raw_data/*"])
converters No dict File converters to apply (see below)

Complete Example

version: 0.1
dataset_name: 'my_dataset_name'
description: "Description of the experiment I want to do."
attributes:
  initials: 'QH'
  set_up: 'XLD001'
  sample: 'my_sample'
tags: ['rabi', 'test']
skip: ['*.json', 'raw_data/*']
converters:
  csv_to_hdf5_converter:
    module: etiket_sync_agent_qh_converters
    class: CSVToHDF5Converter

⚠️ Note: The YAML file must use spaces for indentation, not tabs. Using tabs will cause parsing errors and synchronization will fail.


File Converters

You can specify converters to automatically transform files during sync. The naming convention is {input}_to_{output}_converter.

Converter Syntax

converters:
  txt_to_csv_converter:
    module: my_library.location.to.module
    class: MyConverterClass

Available Converters

The etiket_sync_agent_qh_converters package provides built-in converters:

  • zarr → HDF5
  • CSV → HDF5
  • And more...

To create custom converters, implement a class that inherits from FileConverter and provides the convert method. The converter can be installed with the etiket_sdk package. For more information on creating converters, see the etiket_sync_agent package documentation.


Programmatic Dataset Creation

You can programmatically create the _QH_dataset_info.yaml file using the generate_dataset_info function:

from datetime import datetime
from etiket_sync_agent_folderbase import generate_dataset_info
from etiket_sync_agent_qh_converters import CSVToHDF5Converter

path = "my_path/test/"
generate_dataset_info(
    path,
    dataset_name="my_dataset_name",
    creation=datetime.now(),
    description="Description of the experiment I want to do.",
    attributes={"sample": "my_sample"},
    tags=["rabi", "test"],
    converters=[CSVToHDF5Converter],
    skip=["*.json", "raw_data/*"]
)

Note: This function is also re-exported by the qdrive package as qdrive.dataset.generate_dataset_info.

See dataset_info.py for the full function signature and documentation.


What Gets Synchronized

Source eTiKeT Field Description
dataset_name or folder name name Name of the dataset
description description Dataset description (appended with source path)
created/collected or earliest file mtime collected Dataset creation time
tags tags Searchable tags
attributes attributes Key-value metadata
All files (except skipped) Data files Uploaded with detected file type

Supported File Types

Any file type is supported. For zarr files (which are actually folders), use a converter from etiket_sync_agent_qh_converters to convert them to HDF5.


Features

  • Directory-based dataset discovery: Automatically finds datasets by _QH_dataset_info.yaml presence
  • YAML-based configuration: Simple declarative dataset metadata
  • File converter support: Transform files during sync (e.g., zarr → HDF5, CSV → HDF5)
  • Skip patterns: Exclude files/folders using glob patterns
  • Automatic file type detection: Detects JSON, text, HDF5/NetCDF files
  • Subdirectory support: Syncs all files recursively within dataset folders

Requirements

  • Python >= 3.10
  • xarray
  • h5netcdf
  • PyYAML

License

Copyright © 2025 QHarbor. All Rights Reserved. See LICENCE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

etiket_sync_agent_folderbase-0.3.0b1-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file etiket_sync_agent_folderbase-0.3.0b1-py3-none-any.whl.

File metadata

File hashes

Hashes for etiket_sync_agent_folderbase-0.3.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 1fcf2de2614fdf544504441af97df7470461b9e818d00ec10eabe3578a00252e
MD5 d702f95457479ac51e7e039ff19457d1
BLAKE2b-256 b9e5045f50946253aab28abc5e740ca38c569d79f6566733a0d50e5ff5a7648e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page