Skip to main content

A NOMAD plugin for managing ML workflows.

Project description

nomad-ml-workflows

A NOMAD plugin for managing ML workflows. Currently, it provides an action to export large number of entries from NOMAD database as tabular data files. Other ML workflow related actions and schemas will be added in future.

📦 Installation

You can install the plugin using pip:

pip install nomad-ml-workflows @ git+https://github.com/FAIRmat-NFDI/nomad-ml-workflows.git

However, to fully utilize the plugin, you need to add it to your NOMAD instance as described below.

✨ Features

  • Export a large number of NOMAD entries as tabular data files (CSV, Parquet) using NOMAD Actions. Once the action is triggered, it will:

    • Search entries based on user-defined criteria.
    • Optionally include or exclude data fields from the entries.
    • Package the entries into tabular data files like CSV or Parquet (or as JSON)
    • Export the files to a specified Project (or previously known as Upload) in NOAMD.

    These can then be downloaded from the NOMAD web interface for local use.

⚙️ Configuration

The Export Entries action can be configured using the following parameters in the nomad.yaml configuration file of your NOMAD Oasis instance:

plugins:
  entry_points:
    options:
      nomad_ml_workflows.actions:export_entries:
        search_batch_timeout: 7200
        # Timeout (in seconds) for each search batch in the Export Entries
        # action. Set this accordingly to time out longer searches.
        max_entries_export_limit: 100000
        # Maximum number of entries that can be exported in a single
        # Export Entries action.

🚀 Adding this plugin to NOMAD

Currently, NOMAD has two distinct flavors that are relevant depending on your role as an user:

  1. A NOMAD Oasis: any user with a NOMAD Oasis instance.
  2. Local NOMAD installation and the source code of NOMAD: internal developers.

Adding this plugin in your NOMAD Oasis

Read the NOMAD plugin documentation for all details on how to deploy the plugin on your NOMAD instance.

Adding this plugin in your local NOMAD installation and the source code of NOMAD

We now recommend using the dedicated nomad-distro-dev repository to simplify the process. Please refer to that repository for detailed instructions.

🛠️ Development

If you want to develop locally this plugin, clone the project and in the plugin folder, create a virtual environment (you can use Python 3.10, 3.11 or 3.12):

git clone https://github.com/FAIRmat-NFDI/nomad-ml-workflows.git
cd nomad-ml-workflows
python3.11 -m venv .pyenv
. .pyenv/bin/activate

Make sure to have pip upgraded:

pip install --upgrade pip

We recommend installing uv for fast pip installation of the packages:

pip install uv

Install the nomad-lab package:

uv pip install -e '.[dev]'

Run linting and auto-formatting

We use Ruff for linting and formatting the code. Ruff auto-formatting is also a part of the GitHub workflow actions. You can run locally:

ruff check .
ruff format . --check

Debugging

For interactive debugging of the tests, use pytest with the --pdb flag. We recommend using an IDE for debugging, e.g., VSCode. If that is the case, add the following snippet to your .vscode/launch.json:

{
  "configurations": [
      {
        "name": "<descriptive tag>",
        "type": "debugpy",
        "request": "launch",
        "cwd": "${workspaceFolder}",
        "program": "${workspaceFolder}/.pyenv/bin/pytest",
        "justMyCode": true,
        "env": {
            "_PYTEST_RAISE": "1"
        },
        "args": [
            "-sv",
            "--pdb",
            "<path-to-plugin-tests>",
        ]
    }
  ]
}

where <path-to-plugin-tests> must be changed to the local path to the test module to be debugged.

The settings configuration file .vscode/settings.json automatically applies the linting and formatting upon saving the modified file.

Documentation on Github pages

To view the documentation locally, install the related packages using:

uv pip install -r requirements_docs.txt

Run the documentation server:

mkdocs serve

👥 Main contributors

Name E-mail
Sarthak Kapoor sarthak.kapoor@physik.hu-berlin.de

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nomad_ml_workflows-0.0.6.tar.gz (112.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nomad_ml_workflows-0.0.6-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file nomad_ml_workflows-0.0.6.tar.gz.

File metadata

  • Download URL: nomad_ml_workflows-0.0.6.tar.gz
  • Upload date:
  • Size: 112.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for nomad_ml_workflows-0.0.6.tar.gz
Algorithm Hash digest
SHA256 a1f15e8b23ec8c261815ee7a96c9cf6a88c0a1c650d08d7f053b3189836cf9cc
MD5 74e4dafb3620a5b44ec8ba6f1fa91373
BLAKE2b-256 cc3b07e99a72d6708e1deb8f13ae9b3db3aefe05aa03da0663fae672740d45cf

See more details on using hashes here.

File details

Details for the file nomad_ml_workflows-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for nomad_ml_workflows-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 650e4777befd51ab31864ce7e66e46ac0744e907f457bbbd499bd14f42da3d55
MD5 376e8dc63eabbaad61db14033d6bb037
BLAKE2b-256 b16edfe3a31b070cb7cbbc8ecee4fe65318749733b220cd2043e5b0e4c7b0757

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page