Skip to main content

Datmat is a tool for data materalisation; it gets your data where it is, to where you need it to be.

Project description

Data Materialisation

Getting started

Install datmat from PyPI:

pip install datmat

In datmat you can interface with multiple data sources and storage solutions through a plugin system. By linking together different plugins you can move data from one place to another. A set of plugins is already installed when installing the package, but the program is set up to support development of custom plugins. The plugins can be called by using a URL scheme to preface the path or URL to your file. For example, by using file:///home/user/file.txt you can access the local file /home/user/file.txt, or by using xnat+https://xnat.health-ri.nl/projects/sandbox you can access the XNAT project sandbox on xnat.health-ri.nl over HTTPS.

See below examples of various use cases.

Downloading from XNAT into EUCAIM directory structure

Through the use of the xnat+https:// plugin it is possible to download files from an XNAT instance. The eucaimdir:// plugin will store the files in the destination folder in the following nested folder structure:

/dest_folder/project_name/subject_label/experiment_label/{scan_id}_{scan_type}/file

The path /dest_folder needs to be supplied with the starting /, so the URL will be eucaimdir:///dest_folder.

A complete project

import datmat

datmat.materialize('xnat+https://xnat.health-ri.nl/projects/sandbox',
                   'eucaimdir:///dest_folder',
                   tempdir='/temp_directory')

Note: By default only the 'DICOM' resource is downloaded per scan. To download all resources a query can be added to the input URL:

import datmat

datmat.materialize('xnat+https://xnat.health-ri.nl/projects/sandbox?resources=*',
                   'eucaimresdir:///dest_folder',
                   tempdir='/temp_directory')

By using the eucaimresdir:/// output URL scheme, a folder will be created for each of the resources, like this:

/dest_folder/project_name/subject_label/experiment_label/{scan_id}_{scan_type}/resource_name/files/file

A single subject

import datmat

datmat.materialize('xnat+https://xnat.health-ri.nl/search?projects=sandbox&subjects=TEST01&resources=DICOM',
                   'eucaimdir:///dest_folder',
                   tempdir='/temp_directory')

The datmat package is based on the IOPlugin system of Fastr. See the documentation for the XNATStorage IOPlugin for more information on querying XNAT.

Other use cases

Copy file to file

import datmat

datmat.materialize('file:///input_file',
                   'file:///dest_file',
                   tempdir='/temp_directory')

Developing your own plugin

You can connect your own data repository or define your own data structure by developing a custom plugin. Each plugin is a subclass of IOPlugin and uses a URL scheme (like file:// or xnat+https://) to identify the data source or destination.

Plugin Architecture Overview

Plugins in datmat serve two primary functions:

  1. Source plugins - Pull data from external sources (e.g., XNAT)
  2. Sink plugins - Push data to destinations in specific structures (e.g., EUCAIM directory)

Data is passed between plugins using two key data classes:

  • URLSample - Contains source URLs and metadata
  • PathSample - Contains file paths and metadata

Creating a Basic Plugin

To create a custom plugin:

  1. Subclass IOPlugin and define a unique URL scheme:
class MyPlugin(IOPlugin):
    scheme = 'myplugin'  # URL scheme for your plugin
  1. Override the necessary methods depending on whether your plugin is a source, sink, or both:
def setup(self):
    """Optional initialization (e.g., connect to repository)"""
    pass
    
def cleanup(self):
    """Optional cleanup (e.g., disconnect from repository)"""
    pass

Creating a Source Plugin

For a plugin that pulls data from a source, implement these methods:

def expand_url(self, urlsample):
    """Convert a single URL entry point into multiple downloadable parts"""
    # Return either a single URLSample or a tuple of (id, URLSample) pairs
    
def fetch_url(self, inurlsample, outpath):
    """Download data based on URLSample to the specified path"""
    # Return a PathSample containing the downloaded data and metadata

Creating a Sink Plugin

For a plugin that stores data in a specific structure, implement these methods:

def put_url(self, sample, outurl):
    """Copy data from temporary location to final destination"""
    # Return True if successful, False otherwise
    
def url_to_path(self, url):
    """Convert plugin URL to filesystem path"""
    # Return the path as a string

Creating a Custom Directory Structure

The easiest way to create a custom directory structure is to subclass StructuredDirectory and implement only the _sample_to_outpath method:

class MyStructure(StructuredDirectory):
    scheme = 'mystructure'
    
    def _sample_to_outpath(self, url, sample):
        """Define your custom directory structure here"""
        return self.url_to_path(url / sample.project_name / f'{sample.subject_label}')

Available Metadata Properties

The following properties are available in the PathSample object (if populated by your source plugin):

  • project_name - Name of the project
  • subject_label - Label of the subject
  • experiment_label - Label of the experiment
  • experiment_date - Date the experiment was acquired
  • scan_id - ID of the scan
  • scan_type - Type of the scan (e.g., T1w)
  • filename - Filename (can be a partial path for subdirectories)
  • timepoint - Label of the timepoint the data is from
  • data_path - Path to the data on disk

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datmat-0.2.0.tar.gz (29.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datmat-0.2.0-py3-none-any.whl (30.6 kB view details)

Uploaded Python 3

File details

Details for the file datmat-0.2.0.tar.gz.

File metadata

  • Download URL: datmat-0.2.0.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for datmat-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bca14af26f996122934c712a9cf1d83a96d84c761d21261794f1e268854a9545
MD5 20e2f902fd1b16543ed5933eaaf38207
BLAKE2b-256 ebda61785a213836420aea7f6916ae879bc5d5553916172d73c71b239a5f7d41

See more details on using hashes here.

File details

Details for the file datmat-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: datmat-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for datmat-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8a19934ce8fb48e12c1fe57f73fadfc991dce0aae19c74b0589abe3d3e9777b7
MD5 5bec9cab0ea0b2ae022a2fb762e77524
BLAKE2b-256 c8232ff9f473f43439805de5adedd66fee641a0b07297c038323e6e76f9324b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page