Datmat is a tool for data materalisation; it gets your data where it is, to where you need it to be.
Project description
Data Materialisation
Getting started
Install datmat from PyPI:
pip install datmat
In datmat you can interface with multiple data sources and storage solutions through a plugin system.
By linking together different plugins you can move data from one place to another.
A set of plugins is already installed when installing the package, but the program is set up to support development
of custom plugins. The plugins can be called by using a URL scheme to preface the path or URL to your file. For example,
by using file:///home/user/file.txt you can access the local file /home/user/file.txt, or by using
xnat+https://xnat.health-ri.nl/projects/sandbox you can access the XNAT project sandbox on xnat.health-ri.nl over HTTPS.
See below examples of various use cases.
Downloading from XNAT into EUCAIM directory structure
Through the use of the xnat+https:// plugin it is possible to download files from an XNAT instance.
The eucaimdir:// plugin will store the files in the destination folder in the following nested folder structure:
/dest_folder/project_name/subject_label/experiment_label/{scan_id}_{scan_type}/file
The path /dest_folder needs to be supplied with the starting /, so the URL will be eucaimdir:///dest_folder.
A complete project
import datmat
datmat.materialize('xnat+https://xnat.health-ri.nl/projects/sandbox',
'eucaimdir:///dest_folder',
tempdir='/temp_directory')
Note: By default only the 'DICOM' resource is downloaded per scan. To download all resources a query can be added to the input URL:
import datmat
datmat.materialize('xnat+https://xnat.health-ri.nl/projects/sandbox?resources=*',
'eucaimresdir:///dest_folder',
tempdir='/temp_directory')
By using the eucaimresdir:/// output URL scheme, a folder will be created for
each of the resources, like this:
/dest_folder/project_name/subject_label/experiment_label/{scan_id}_{scan_type}/resource_name/files/file
A single subject
import datmat
datmat.materialize('xnat+https://xnat.health-ri.nl/search?projects=sandbox&subjects=TEST01&resources=DICOM',
'eucaimdir:///dest_folder',
tempdir='/temp_directory')
The datmat package is based on the IOPlugin system of Fastr. See the documentation for the XNATStorage IOPlugin
for more information on querying XNAT.
Other use cases
Copy file to file
import datmat
datmat.materialize('file:///input_file',
'file:///dest_file',
tempdir='/temp_directory')
Developing your own plugin
You can connect your own data repository or define your own data structure by developing a custom plugin. Each plugin is a subclass of IOPlugin and uses a URL scheme (like file:// or xnat+https://) to identify the data source or destination.
Plugin Architecture Overview
Plugins in datmat serve two primary functions:
- Source plugins - Pull data from external sources (e.g., XNAT)
- Sink plugins - Push data to destinations in specific structures (e.g., EUCAIM directory)
Data is passed between plugins using two key data classes:
URLSample- Contains source URLs and metadataPathSample- Contains file paths and metadata
Creating a Basic Plugin
To create a custom plugin:
- Subclass
IOPluginand define a unique URL scheme:
class MyPlugin(IOPlugin):
scheme = 'myplugin' # URL scheme for your plugin
- Override the necessary methods depending on whether your plugin is a source, sink, or both:
def setup(self):
"""Optional initialization (e.g., connect to repository)"""
pass
def cleanup(self):
"""Optional cleanup (e.g., disconnect from repository)"""
pass
Creating a Source Plugin
For a plugin that pulls data from a source, implement these methods:
def expand_url(self, urlsample):
"""Convert a single URL entry point into multiple downloadable parts"""
# Return either a single URLSample or a tuple of (id, URLSample) pairs
def fetch_url(self, inurlsample, outpath):
"""Download data based on URLSample to the specified path"""
# Return a PathSample containing the downloaded data and metadata
Creating a Sink Plugin
For a plugin that stores data in a specific structure, implement these methods:
def put_url(self, sample, outurl):
"""Copy data from temporary location to final destination"""
# Return True if successful, False otherwise
def url_to_path(self, url):
"""Convert plugin URL to filesystem path"""
# Return the path as a string
Creating a Custom Directory Structure
The easiest way to create a custom directory structure is to subclass StructuredDirectory and implement only the _sample_to_outpath method:
class MyStructure(StructuredDirectory):
scheme = 'mystructure'
def _sample_to_outpath(self, url, sample):
"""Define your custom directory structure here"""
return self.url_to_path(url / sample.project_name / f'{sample.subject_label}')
Available Metadata Properties
The following properties are available in the PathSample object (if populated by your source plugin):
project_name- Name of the projectsubject_label- Label of the subjectexperiment_label- Label of the experimentexperiment_date- Date the experiment was acquiredscan_id- ID of the scanscan_type- Type of the scan (e.g., T1w)filename- Filename (can be a partial path for subdirectories)timepoint- Label of the timepoint the data is fromdata_path- Path to the data on disk
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file datmat-0.2.0.tar.gz.
File metadata
- Download URL: datmat-0.2.0.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bca14af26f996122934c712a9cf1d83a96d84c761d21261794f1e268854a9545
|
|
| MD5 |
20e2f902fd1b16543ed5933eaaf38207
|
|
| BLAKE2b-256 |
ebda61785a213836420aea7f6916ae879bc5d5553916172d73c71b239a5f7d41
|
File details
Details for the file datmat-0.2.0-py3-none-any.whl.
File metadata
- Download URL: datmat-0.2.0-py3-none-any.whl
- Upload date:
- Size: 30.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a19934ce8fb48e12c1fe57f73fadfc991dce0aae19c74b0589abe3d3e9777b7
|
|
| MD5 |
5bec9cab0ea0b2ae022a2fb762e77524
|
|
| BLAKE2b-256 |
c8232ff9f473f43439805de5adedd66fee641a0b07297c038323e6e76f9324b4
|