Skip to main content

A Model Context Protocol (MCP) server to access data about training materials.

Project description

Project 18: Mining the potential of knowledge graphs for metadata on training

Abstract

Knowledge graphs (KGs) can greatly increase the potential of data by revealing hidden relationships and turning it into useful information. A KG is a graph-based representation of data that stores relations between subjects, predicates and objects in triplestores. These entities are typically described in pre-defined ontologies, which increase interoperability and connect data that would otherwise remain isolated in siloed databases. This structured data representation can greatly facilitate complex querying and applications to deep learning approaches like generative AI.

ELIXIR and its Nodes are making a major effort to make the wealth of open training materials on the computational life sciences reusable, amongst others by guidelines and support for annotating training materials with standardized metadata. One major step in standardizing metadata is the use of the Bioschemas training profile, which became a standard for representing training metadata. Despite being standardized and interoperable, there is still a lot of potential to turn these resources into valuable information, linking training data across various databases.

In this project, we aim to create queryable KGs derived from training metadata in the Bioschemas format available from platforms like TeSS and glittr.org. In a subsequent step, we will investigate the potential of such KGs for several use cases, including construction of custom learning paths, creation of detailed trainer profiles, and connection of training metadata to other databases. These use-cases will also shed light on the limits on the currently available metadata, and will help to make future choices on richer metadata and standards.

Leads

Geert van Geest, Harshita Gupta, Vincent Emonet

💬 MCP server

A Model Context Protocol (MCP) server to access and search through the training materials of multiple Elixir repositories, such as TeSS and Glittr.

⚡️ Usage

[!IMPORTANT]

Requirement: uv, to easily handle python scripts and virtual environments

Use with STDIO transport:

uv run elixir-training-mcp

Use with Deploy as Streamable HTTP server:

uv run elixir-training-mcp --http

🧰 Available MCP tools

Once the server is running you can call the following tools from your MCP-compatible client:

Tool Description
search_training_materials Proxies the live TeSS API and returns raw JSON results.
keyword_search Searches the harvested TTL datasets (TeSS + GTN) by free-text keyword and returns enriched metadata.
provider_search Filters harvested resources by provider name (case-insensitive).
location_search Returns TeSS course instances in a given country (optionally city).
date_search Finds TeSS course instances starting within a provided ISO date range.
topic_search Matches harvested resources by EDAM identifier or topic label.
dataset_stats Summarises dataset diagnostics (resource counts, type distribution, access modes).

[!NOTE] The local tools read from data/tess_harvest.ttl and data/gtn_harvest.ttl. Regenerate these files with the harvest scripts if you need fresher data.

🔌 Connect client to MCP server

Follow the instructions of your favorite chat client.

To add a new MCP server to VSCode GitHub Copilot:

  • Install the GitHub.copilot extension
  • Open the Command Palette (ctrl+shift+p or cmd+shift+p)
  • Search for MCP: Add Server...
    • Choose STDIO, and provide the command: uvx elixir-training-mcp
    • Or choose HTTP, and provide the MCP server URL, e.g. http://localhost:8000/mcp

To use it with STDIO transport, your VSCode mcp.json should look like:

{
   "servers": {
      "elixir-training-mcp": {
         "type": "stdio",
         "command": "uvx",
         "args": ["elixir-training-mcp"]
      }
   }
}

[!TIP]

You can use a local folder for development:

{
   "servers": {
      "elixir-training-mcp": {
         "type": "stdio",
         "cwd": "~/dev/ELIXIR-TrP-KG-training-metadata",
         "command": "uv",
         "args": ["run", "elixir-training-mcp"]
      }
   }
}

You can also connect to a running server using Streamable HTTP:

{
    "servers": {
        "elixir-training-mcp-http": {
            "url": "http://localhost:8000/mcp",
            "type": "http"
        }
    }
}

Harvesting

The data/tess_harvest.ttl files is included in the repository (7MB), you can run the script to harvest JSON-LD data and build this ttl file, but it takes ~30min due to parsing JSON-LD being expensive:

uv run src/elixir_training_mcp/harvest/harvest_tess.py

Harvest GTN:

uv run src/elixir_training_mcp/harvest/harvest_gtn.py

[!NOTE]

TeSS contains training materials and courses from various providers, such as GTN (Galaxy Training Network) training materials. Metadata about material in TeSS and GTN can be matched on schema:url

Deploy a SPARQL endpoint on http://localhost:8000:

uv run rdflib-endpoint serve src/elixir_training_mcp/data/*_harvest.ttl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elixir_training_mcp-0.0.3.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

elixir_training_mcp-0.0.3-py3-none-any.whl (1.7 MB view details)

Uploaded Python 3

File details

Details for the file elixir_training_mcp-0.0.3.tar.gz.

File metadata

  • Download URL: elixir_training_mcp-0.0.3.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.23

File hashes

Hashes for elixir_training_mcp-0.0.3.tar.gz
Algorithm Hash digest
SHA256 731d98a5c63eeda2559c1d65751facfc5aa723babe0ba6dd8acf41505c3269ee
MD5 5b254bd1b36297eb52c1e8fcd53bb4ba
BLAKE2b-256 9b9f4e20f82a0f06fc5880c585f678eea053d56eb1c699f1b26ec95bd651e500

See more details on using hashes here.

File details

Details for the file elixir_training_mcp-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for elixir_training_mcp-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7246785a38a86e024f8f95444b6d5eaaf51bf85740e5e1a66011962795fee318
MD5 082049fc2ac96f7e366c6eea1043bbc0
BLAKE2b-256 f93620b9ca3e503a37e910b912994b90f876c6a098976fec17ee99967f5fc3ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page