Skip to main content

A NOMAD plugin for managing ML workflows.

Project description

nomad-ml-workflows

A NOMAD plugin for managing ML workflows. Currently, it provides an action to export large number of entries from NOMAD database as tabular data files. Other ML workflow related actions and schemas will be added in future.

📦 Installation

You can install the plugin using pip:

pip install nomad-ml-workflows @ git+https://github.com/FAIRmat-NFDI/nomad-ml-workflows.git

However, to fully utilize the plugin, you need to add it to your NOMAD instance as described below.

✨ Features

  • Export a large number of NOMAD entries as tabular data files (CSV, Parquet) using NOMAD Actions. Once the action is triggered, it will:

    • Search entries based on user-defined criteria.
    • Optionally include or exclude data fields from the entries.
    • Package the entries into tabular data files like CSV or Parquet (or as JSON)
    • Export the files to a specified Project (or previously known as Upload) in NOAMD.

    These can then be downloaded from the NOMAD web interface for local use.

⚙️ Configuration

The Export Entries action can be configured using the following parameters in the nomad.yaml configuration file of your NOMAD Oasis instance:

plugins:
  entry_points:
    options:
      nomad_ml_workflows.actions:export_entries:
        search_batch_timeout: 7200
        # Timeout (in seconds) for each search batch in the Export Entries
        # action. Set this accordingly to time out longer searches.
        max_entries_export_limit: 100000
        # Maximum number of entries that can be exported in a single
        # Export Entries action.

🚀 Adding this plugin to NOMAD

Currently, NOMAD has two distinct flavors that are relevant depending on your role as an user:

  1. A NOMAD Oasis: any user with a NOMAD Oasis instance.
  2. Local NOMAD installation and the source code of NOMAD: internal developers.

Adding this plugin in your NOMAD Oasis

Read the NOMAD plugin documentation for all details on how to deploy the plugin on your NOMAD instance.

Adding this plugin in your local NOMAD installation and the source code of NOMAD

We now recommend using the dedicated nomad-distro-dev repository to simplify the process. Please refer to that repository for detailed instructions.

🛠️ Development

If you want to develop locally this plugin, clone the project and in the plugin folder, create a virtual environment (you can use Python 3.10, 3.11 or 3.12):

git clone https://github.com/FAIRmat-NFDI/nomad-ml-workflows.git
cd nomad-ml-workflows
python3.11 -m venv .pyenv
. .pyenv/bin/activate

Make sure to have pip upgraded:

pip install --upgrade pip

We recommend installing uv for fast pip installation of the packages:

pip install uv

Install the nomad-lab package:

uv pip install -e '.[dev]'

Run linting and auto-formatting

We use Ruff for linting and formatting the code. Ruff auto-formatting is also a part of the GitHub workflow actions. You can run locally:

ruff check .
ruff format . --check

Debugging

For interactive debugging of the tests, use pytest with the --pdb flag. We recommend using an IDE for debugging, e.g., VSCode. If that is the case, add the following snippet to your .vscode/launch.json:

{
  "configurations": [
      {
        "name": "<descriptive tag>",
        "type": "debugpy",
        "request": "launch",
        "cwd": "${workspaceFolder}",
        "program": "${workspaceFolder}/.pyenv/bin/pytest",
        "justMyCode": true,
        "env": {
            "_PYTEST_RAISE": "1"
        },
        "args": [
            "-sv",
            "--pdb",
            "<path-to-plugin-tests>",
        ]
    }
  ]
}

where <path-to-plugin-tests> must be changed to the local path to the test module to be debugged.

The settings configuration file .vscode/settings.json automatically applies the linting and formatting upon saving the modified file.

Documentation on Github pages

To view the documentation locally, install the related packages using:

uv pip install -r requirements_docs.txt

Run the documentation server:

mkdocs serve

👥 Main contributors

Name E-mail
Sarthak Kapoor sarthak.kapoor@physik.hu-berlin.de

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nomad_ml_workflows-0.0.7.tar.gz (112.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nomad_ml_workflows-0.0.7-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file nomad_ml_workflows-0.0.7.tar.gz.

File metadata

  • Download URL: nomad_ml_workflows-0.0.7.tar.gz
  • Upload date:
  • Size: 112.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for nomad_ml_workflows-0.0.7.tar.gz
Algorithm Hash digest
SHA256 1421fd7bf3cc732be7d1a4ef938712d31db2f84661d69e9196c50c1d4b8416ed
MD5 6fdeccc7aba31ce1e6b8b4d151dcb010
BLAKE2b-256 d7de8717aa3269652c84cbb594f516708e10d2b3880f0e175dab1a10c232888f

See more details on using hashes here.

File details

Details for the file nomad_ml_workflows-0.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for nomad_ml_workflows-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 ec09a38e61934e064d3c55eced051ff443775f713540d78539c424a8446ac767
MD5 66bc5918e136d6956b8de80165f28dd1
BLAKE2b-256 d9f091d8f2f086afb74d2802eb40c6db8e1fb5b1877e1baa4e54a9a9c79ca903

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page