Skip to main content

PYNIDM: a Python NIDM library and tools

Project description

Logo.png

1 PyNIDM: Neuroimaging Data Model in Python

A Python library to manipulate the Neuroimaging Data Model.

Status of PyNIDM Testing ReadTheDocs Documentation of master branch

1.1 Dependencies

  • Git-annex

  • Graphviz (native package):

    • Fedora: dnf install graphviz

    • OS-X: brew install graphviz

1.2 Installation

$ pip install pynidm

1.3 Contributing to the Software

This software is open source and community developed. As such, we encourage anyone and everyone interested in semantic web and neuroimaging to contribute. To begin contributing code to the repository, please fork the main repo into your user space and use the pull request GitHub feature to submit code for review. Please provide a reasonably detailed description of what was changed and why in the pull request.

To establish development environment, we recommend to install the clone of this repository in development mode with development tools installed via

$ pip install -e .[devel]

We also recommend using pre-commit for ensuring that your contributions would conform our conventions for code quality etc. You can enable pre-commit by running once in your clone

$ pre-commit install

which would then ensure that all commits would be subject to black code reformatting etc.

1.4 Reporting Issues or Problems

If you encounter a bug, you can directly report it in the issues section. Please describe how to reproduce the issue and include as much information as possible that can be helpful for fixing it. If you would like to suggest a fix, please open a new pull request or include your suggested fix in the issue.

1.5 Support and Feedback

We would love to hear your thoughts on our Python toolbox. Feedback, questions, or feature requests can also be submitted as issues. Note, we are a small band of researchers who mostly volunteer our time to this project. We will respond as quickly as possible.

1.6 NIDM Model Details

NIDM files (typically nidm.ttl) are RDF Turtle documents that represent neuroimaging study data using the W3C PROV provenance data model. Every entity, activity, and agent is identified by a URI and connected by typed RDF triples, making NIDM data machine-readable, semantically rich, and interoperable across sites and tools.

The terms and classes used in NIDM documents are formally defined in the NIDM-Experiment ontology. Community-based management of the controlled vocabulary used to annotate data elements is described in Keator et al., Frontiers in Neuroinformatics 2023 and maintained in the NIDM-Terms repository.

A formal LinkML schema documenting the complete graph structure is provided at src/nidm/experiment/schema/nidm_schema.yaml.

1.6.1 Graph Hierarchy

A NIDM graph is organized as a hierarchy of W3C PROV objects. Each node carries one or more rdf:type assertions — one NIDM-specific type giving its scientific role, and one PROV type giving its provenance role:

Project  (nidm:Project + prov:Activity)
│
├── Session  (nidm:Session + prov:Activity)          [dct:isPartOf → Project]
│    │
│    └── Acquisition  (nidm:Acquisition + prov:Activity)
│         │                                          [dct:isPartOf → Session]
│         └── AcquisitionObject  (nidm:AcquisitionObject + prov:Entity)
│                                    [prov:wasGeneratedBy → Acquisition]
│                                    [variable values stored as RDF properties]
│
├── DataElement  (nidm:DataElement / nidm:PersonalDataElement + prov:Entity)
│
└── Derivative  (nidm:Derivative + prov:Activity)   [dct:isPartOf → Project]
     │
     └── DerivativeObject  (prov:Entity)            [prov:wasGeneratedBy → Derivative]
                                [derived values stored as RDF properties]

Project is the top-level container for a study or dataset, holding title, license, funding, and versioning metadata.

Session groups the acquisitions for one participant visit.

Acquisition represents a single data-collection event — an MRI scan, a questionnaire, or a demographic entry. Imaging acquisitions carry nidm:hadAcquisitionModality, nidm:hadImageContrastType, and nidm:hadImageUsageType.

AcquisitionObject is the entity produced by an Acquisition. For imaging data it stores the filename and checksum; for assessments and demographics it stores measured values as RDF properties, using DataElement URIs as predicates.

Derivative / DerivativeObject represent post-processing pipelines (FreeSurfer, FSL, ANTs, etc.) and the analysis results they produce.

1.6.2 Participant Linkage

Participants are prov:Person agents linked to Acquisitions through PROV’s qualified-association pattern:

Acquisition
  └── prov:qualifiedAssociation
        └── prov:Association  (blank node)
              ├── prov:agent    ──►  Person
              │                       └── ndar:src_subject_id  "sub-001"
              └── prov:hadRole  ──►  sio:Subject

ndar:src_subject_id on the Person node is the primary human-readable participant identifier across all PyNIDM query operations.

1.6.3 DataElements and Measurement Values

DataElements define the semantics of every measured variable — its label, data type, units, valid range, and linkage to a shared ontology concept via nidm:isAbout. Linking variables to concepts from the NIDM-Experiment ontology or community registries such as InterLex enables federated queries across datasets that use different local variable names for the same underlying concept.

DataElement URIs serve a dual role in the graph:

  1. As subjects — the DataElement URI carries all metadata about the variable (label, units, ontology mapping, etc.).

  2. As predicates — the same URI is used as the RDF predicate on AcquisitionObjects and DerivativeObjects to store actual measured values.

A PersonalDataElement (demographic or assessment variable) in Turtle:

niiri:gender_hrg8rh  a nidm:PersonalDataElement, prov:Entity ;
    rdfs:label              "gender" ;
    dct:description         "Gender of participant" ;
    nidm:sourceVariable     "gender" ;
    nidm:isAbout            ilx:ilx_0101292 ;
    nidm:valueType          xsd:complexType ;
    nidm:minValue           "NA" ;
    nidm:maxValue           "NA" ;
    reproschema:choices     [ rdfs:label "male"   ; reproschema:value "1" ],
                            [ rdfs:label "female" ; reproschema:value "2" ] ;
    ilx:ilx_0739289         "NIDM" .

# Same DataElement URI used as a predicate to store a subject's value:
niiri:acqobj_abc123  prov:wasGeneratedBy niiri:acq_456 ;
                     niiri:gender_hrg8rh  "1"^^xsd:string .

An imaging pipeline DataElement (e.g. from FreeSurfer):

fs:fs_000003  a nidm:DataElement ;
    rdfs:label           "Brain Segmentation Volume (mm^3)" ;
    nidm:isAbout         obo:UBERON_0000955 ;
    nidm:measureOf       ilx:ilx_0112559 ;
    nidm:datumType       ilx:ilx_0738276 ;
    nidm:unitCode        "mm^3" ;
    nidm:hasLaterality   "Bilateral" .
1.6.3.1 DataElement Property Reference

RDF Predicate

Description

rdf:type

nidm:PersonalDataElement (demographic / assessment) or nidm:DataElement (imaging pipeline CDE), always combined with prov:Entity

rdfs:label

Human-readable variable name

dct:description

Free-text description of the variable

rdfs:comment

Longer formal definition (used when importing terms from external registries)

nidm:sourceVariable

Original column / variable name in the source dataset

nidm:isAbout

URI of the ontology concept this variable represents (e.g. ilx:ilx_0100400 for age). The key property enabling cross-dataset concept-based federated queries. See the NIDM-Experiment ontology and InterLex

nidm:valueType

XSD datatype URI for the variable’s values: xsd:float, xsd:integer, xsd:string, xsd:boolean, or xsd:complexType for categorical variables

nidm:minValue

Minimum allowed value ("NA" if not applicable)

nidm:maxValue

Maximum allowed value ("NA" if not applicable)

nidm:unitCode

Unit of measurement string (e.g. "mm^3", "years", "vertex")

reproschema:choices

Categorical response options. Each choice is a blank node with rdfs:label (display text) and reproschema:value (stored code), or a plain literal string for simple enumerations

nidm:measureOf

URI of the physical / biological property being measured (e.g. ilx:ilx_0112559 for volume, obo:PATO_0001323 for surface area). Used primarily in imaging pipeline CDEs

nidm:datumType

URI of the measurement datum type (e.g. ilx:ilx_0738276 for scalar, ilx:ilx_0102597 for count). Used primarily in imaging pipeline CDEs

nidm:hasLaterality

Brain laterality: "Left", "Right", or "Bilateral". Used in imaging pipeline CDEs

nidm:url

URL linking to this variable’s entry in a terminology registry (e.g. InterLex / SciCrunch)

nidm:sameAs

URI of an equivalent term in another vocabulary

bids:allowableValues

Allowable values for BIDS-sourced variables

ilx:ilx_0739289

Terminology provenance tag (e.g. "NIDM") indicating which controlled vocabulary sourced this term

1.6.4 Key Namespaces

nidm:          http://purl.org/nidash/nidm#
prov:          http://www.w3.org/ns/prov#
niiri:         http://iri.nidash.org/              (instance identifiers)
ndar:          https://ndar.nih.gov/api/datadictionary/v2/dataelement/
dct:           http://purl.org/dc/terms/
dctypes:       http://purl.org/dc/dcmitype/
sio:           http://semanticscience.org/ontology/sio.owl#
obo:           http://purl.obolibrary.org/obo/
onli:          http://neurolog.unice.fr/ontoneurolog/v3.0/instrument.owl#
reproschema:   http://schema.repronim.org/
ilx:           http://uri.interlex.org/
freesurfer:    https://surfer.nmr.mgh.harvard.edu/
fsl:           http://purl.org/nidash/fsl#
ants:          http://stnava.github.io/ANTs/
bids:          http://bids.neuroimaging.io/

1.6.5 Example SPARQL Queries

List all projects and their titles:

PREFIX nidm:    <http://purl.org/nidash/nidm#>
PREFIX dctypes: <http://purl.org/dc/dcmitype/>

SELECT ?project ?title WHERE {
  ?project a nidm:Project .
  OPTIONAL { ?project dctypes:title ?title }
}

List all subjects and their source IDs:

PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX ndar: <https://ndar.nih.gov/api/datadictionary/v2/dataelement/>

SELECT ?person ?subject_id WHERE {
  ?person a prov:Person ;
          ndar:src_subject_id ?subject_id .
}

Retrieve values for a variable (e.g. AGE_AT_SCAN) across all subjects:

PREFIX prov:  <http://www.w3.org/ns/prov#>
PREFIX ndar:  <https://ndar.nih.gov/api/datadictionary/v2/dataelement/>
PREFIX rdfs:  <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?subject_id ?value WHERE {
  ?de rdfs:label "AGE_AT_SCAN" .
  ?acq_obj ?de ?value ;
           prov:wasGeneratedBy ?acq .
  ?acq prov:qualifiedAssociation ?assoc .
  ?assoc prov:agent ?person .
  ?person ndar:src_subject_id ?subject_id .
}

Find all DataElements about a given concept using nidm:isAbout (enables cross-dataset federated queries):

PREFIX nidm:  <http://purl.org/nidash/nidm#>
PREFIX rdfs:  <http://www.w3.org/2000/01/rdf-schema#>

SELECT DISTINCT ?de ?label ?sourceVar WHERE {
  { ?de a nidm:DataElement } UNION { ?de a nidm:PersonalDataElement }
  ?de nidm:isAbout <http://uri.interlex.org/ilx_0100400> ;
      rdfs:label ?label .
  OPTIONAL { ?de nidm:sourceVariable ?sourceVar }
}

1.7 NIDM-Experiment Tools

1.7.1 BIDS MRI Conversion to NIDM

This program will convert a BIDS MRI dataset to a NIDM-Experiment RDF document. It will parse phenotype information and simply store variables/values and link to the associated json data dictionary file. To use this tool please set your INTERLEX_API_KEY environment variable to your unique API key. To get an Interlex API key you visit SciCrunch, register for an account, then click on “MyAccount” and “API Keys” to add a new API key for your account.

$ bidsmri2nidm -d [ROOT BIDS DIRECT] -bidsignore

# Write one NIDM file per subject (sub-<id>_nidm.ttl) into the BIDS directory:
$ bidsmri2nidm -d [ROOT BIDS DIRECT] --per_subject

# Or direct the per-subject files to a different output directory:
$ bidsmri2nidm -d [ROOT BIDS DIRECT] --per_subject -o [OUTPUT DIRECTORY]

usage: bidsmri2nidm [-h] -d DIRECTORY [-jsonld] [-bidsignore] [-no_concepts]
                 [-json_map JSON_MAP] [-log LOGFILE] [-o OUTPUTFILE]
                 [-per_subject]

This program will represent a BIDS MRI dataset as a NIDM RDF document and provide user with opportunity to annotate
the dataset (i.e. create sidecar files) and associate selected variables with broader concepts to make datasets more
FAIR.

Note, you must obtain an API key to Interlex by signing up for an account at scicrunch.org then going to My Account
and API Keys.  Then set the environment variable INTERLEX_API_KEY with your key.

optional arguments:
  -h, --help            show this help message and exit
  -d DIRECTORY          Full path to BIDS dataset directory
  -jsonld, --jsonld     If flag set, output is json-ld not TURTLE
  -bidsignore, --bidsignore
                     If flag set, tool will add NIDM-related files to .bidsignore file
  -no_concepts, --no_concepts
                     If flag set, tool will no do concept mapping
  -log LOGFILE, --log LOGFILE
                     Full path to directory to save log file. Log file name is bidsmri2nidm_[basename(args.directory)].log
  -o OUTPUTFILE         Outputs turtle file called nidm.ttl in BIDS directory by default..or whatever path/filename is set here.
                        In ``--per_subject`` mode this argument is interpreted as an output **directory** (created if missing)
                        into which one ``sub-<id>_nidm.ttl`` file is written per subject.
  -per_subject, --per_subject
                     If flag set, a separate NIDM turtle file will be written for each subject in the BIDS directory,
                     named ``sub-<id>_nidm.ttl``.  By default these are placed in the BIDS directory; use ``-o`` to
                     specify a different output directory.  When combined with ``-bidsignore``, each per-subject file
                     is appended to the BIDS dataset's ``.bidsignore`` file (only when the output directory lies
                     inside the BIDS tree).

map variables to terms arguments:
  -json_map JSON_MAP, --json_map JSON_MAP
                     Optional full path to user-suppled JSON file containing data element definitions.

1.7.2 CSV File to NIDM Conversion

This program will load in a CSV file and iterate over the header variable names performing an elastic search of https://scicrunch.org/nidm-terms for NIDM-ReproNim tagged terms that fuzzy match the variable names. The user will then interactively pick a term to associate with the variable name. The resulting annotated CSV data will then be written to a NIDM data file. To use this tool please set your INTERLEX_API_KEY environment variable to your unique API key. To get an Interlex API key you visit SciCrunch, register for an account, then click on “MyAccount” and “API Keys” to add a new API key for your account.

usage: csv2nidm [-h] -csv CSV_FILE [-json_map JSON_MAP | -csv_map CSV_MAP | -redcap REDCAP]
                [-nidm NIDM_FILE] [-no_concepts] [-log LOGFILE]
                [-dataset_id DATASET_ID] [-derivative DERIVATIVE_METADATA]
                [-out OUTPUT_FILE]

This program will load in a CSV file and iterate over the header variable
names performing an elastic search of https://scicrunch.org/ for NIDM-ReproNim
tagged terms that fuzzy match the variable names. The user will then
interactively pick a term to associate with the variable name. The resulting
annotated CSV data will then be written to a NIDM data file. Note, you must
obtain an API key to Interlex by signing up for an account at scicrunch.org
then going to My Account and API Keys. Then set the environment variable
INTERLEX_API_KEY with your key.  The tool supports import of RedCap data
dictionaries and will convert relevant information into a json-formatted
annotation file used to annotate the data elements in the resulting NIDM file.

optional arguments:
  -h, --help            show this help message and exit
  -csv CSV_FILE         Full path to CSV file to convert
  -json_map JSON_MAP    Full path to user-supplied JSON file containing
                        variable-term mappings.
  -csv_map CSV_MAP      Full path to a user-supplied CSV data dictionary with
                        columns: source_variable, label, description,
                        valueType, measureOf, isAbout, unitCode, minValue,
                        maxValue. Mutually exclusive with -json_map/-redcap.
  -redcap REDCAP        Full path to a user-supplied RedCap formatted data
                        dictionary for csv file.
  -nidm NIDM_FILE       Optional full path of NIDM file to add CSV->NIDM
                        converted graph to
  -no_concepts          If this flag is set then no concept associations will
                        be asked of the user. This is useful if you already
                        have a -json_map specified without concepts and want to
                        simply run this program to get a NIDM file without
                        user interaction to associate concepts.
  -log LOGFILE, --log LOGFILE
                        Full path to directory to save log file. Log file name
                        is csv2nidm_[arg.csv_file].log
  -dataset_id DATASET_ID
                        Optional dataset identifier (e.g. a DOI). When
                        provided, unique data element IDs incorporate this
                        value as part of their hash, ensuring CDE URIs are
                        globally unique across datasets.
  -derivative DERIVATIVE_METADATA
                        If set, indicates the CSV contains derivative data.
                        The value must be the path to a software metadata CSV
                        with columns: title, description, version, url,
                        cmdline, platform, ID. The CSV must also include
                        columns ses, task, run, and source_url.
  -out OUTPUT_FILE      Full path with filename to save NIDM file

1.7.3 convert

This function will convert NIDM files to various RDF-supported formats and name then / put them in the same place as the input file.

Usage: pynidm convert [OPTIONS]

Options:
  -nl, --nidm_file_list TEXT      A comma separated list of NIDM files with
                                  full path  [required]
  -t, --type [turtle|jsonld|xml-rdf|n3|trig]
                                  Output RDF serialization format  [required]
  -out, --outdir TEXT             Optional directory to save converted file.
                                  Defaults to the same directory as the input.
  --help                          Show this message and exit.

1.7.4 concatenate

This function will concatenate NIDM files. Warning, no merging will be done so you may end up with multiple prov:agents with the same subject id if you’re concatenating NIDM files from multiple visits of the same study. If you want to merge NIDM files on subject ID see pynidm merge

Usage: pynidm concat [OPTIONS]

Options:
  -nl, --nidm_file_list TEXT  A comma separated list of NIDM files with full
                            path  [required]
  -o, --out_file TEXT         File to write concatenated NIDM files
                            [required]
  --help                      Show this message and exit.

1.7.5 visualize

This command produces a visualization of the supplied NIDM files as a directed provenance graph, written to the same directory as each input file.

Usage: pynidm visualize [OPTIONS]

Options:
  -nl, --nidm_file_list TEXT    A comma-separated list of NIDM files with
                                full path  [required]
  -fmt, --format [svg|png|pdf]  Output format (default: svg). SVG opens in
                                any web browser with unlimited scroll and
                                zoom. PNG produces a high-resolution raster.
                                PDF is vector but may clip very large graphs.
  --help                        Show this message and exit.

1.7.6 merge

This function will merge NIDM files. See command line parameters for supported merge operations.

Usage: pynidm merge [OPTIONS]

Options:
  -nl, --nidm_file_list TEXT  A comma separated list of NIDM files with full
                           path  [required]
  -s, --s                     If parameter set then files will be merged by
                           ndar:src_subjec_id of prov:agents
      -o, --out_file TEXT         File to write concatenated NIDM files
                           [required]
      --help                      Show this message and exit.

1.7.7 Query

This function provides query support for NIDM graphs. Exactly one query-type option is required (the group is mutually exclusive).

Usage: pynidm query [OPTIONS]

Options:
  -nl, --nidm_file_list TEXT      A comma separated list of NIDM files with
                                  full path  [required]
  -nc, --cde_file_list TEXT       A comma separated list of NIDM CDE files
                                  with full path. Can also be set in the
                                  CDE_DIR environment variable

  Query Type (pick exactly one):
  -q, --query_file FILENAME       Text file containing a SPARQL query to
                                  execute
  -p, --get_participants          Return participant IDs and prov:agent
                                  entity IDs
  -i, --get_instruments           Return list of
                                  onli:assessment-instrument entries
  -iv, --get_instrument_vars      Return variables for all
                                  onli:assessment-instrument entries
  -de, --get_dataelements         Return all DataElements in NIDM file
  -debv, --get_dataelements_brainvols
                                  Return all brain volume DataElements with
                                  details
  -bv, --get_brainvols            Return all brain volume data elements and
                                  values with participant IDs
  -gf, --get_fields TEXT          Return data for a comma-separated list of
                                  field names across all NIDM files
                                  (e.g. -gf age,fs_000003)
  -u, --uri TEXT                  A REST API URI query

  -o, --output_file TEXT          Optional output file (CSV) to store
                                  results of query
  -j / -no_j                      Return result of a uri query as JSON
  -bg, --blaze TEXT               Base URL of a Blazegraph SPARQL endpoint
                                  (e.g. http://localhost:9999/blazegraph/sparql)
  -v, --verbosity TEXT            Verbosity level 0-5, 0 is default
  --help                          Show this message and exit.

Details on the REST API URI format and usage can be found below.

1.7.8 queryai — AI-Assisted Natural Language Query

This tool translates natural-language questions about your NIDM data into SPARQL queries using an LLM (Anthropic Claude or OpenAI GPT). It uses a two-phase approach:

  1. Phase 1 — Concept Resolution: The AI extracts variable concepts (e.g. “age”, “left hippocampus volume”) from your question. The tool then resolves each concept to the exact DataElement URI in your NIDM files by matching on nidm:isAbout (preferred) or nidm:sourceVariable. If multiple DataElements match, you are prompted to select the correct one(s).

  2. Phase 2 — SPARQL Generation: The resolved URIs, together with the NIDM graph structure from the bundled nidm_schema.json, are sent to the LLM which generates a SPARQL query. The query is executed locally against your NIDM files via rdflib — no subject data leaves your machine.

Usage: pynidm queryai [OPTIONS]

Options:
  -nl, --nidm_file_list TEXT  A comma separated list of NIDM files with
                              full path  [required]
  -q, --question TEXT         Natural-language question to ask about the
                              NIDM data. If not provided, enters
                              interactive mode.
  -o, --output_file PATH      Optional output file for results (TSV format)
  -s, --show_query            Show the generated SPARQL query before
                              executing it
  --help                      Show this message and exit.

Prerequisites — an API key for either Anthropic or OpenAI:

export ANTHROPIC_API_KEY=sk-ant-...   # or
export OPENAI_API_KEY=sk-...

Or create a config file at ~/.pynidm/config.json:

{"provider": "anthropic", "api_key": "sk-ant-..."}

Example — count subjects:

pynidm queryai -nl data/nidm.ttl -q "How many subjects are there?" -s

Example — average age:

pynidm queryai -nl data/nidm.ttl -q "What is the average age of all subjects?" -s

Example — interactive mode:

pynidm queryai -nl data/nidm.ttl

A demo script that downloads sample NIDM data and runs several example queries is available at src/nidm/experiment/tools/examples/queryai_demo.sh.

1.7.9 linear_regression

This function provides linear regression support for NIDM graphs.

Usage: pynidm linear-regression [OPTIONS]

Options:
  -nl, --nidm_file_list TEXT      A comma-separated list of NIDM files with
                                  full path  [required]
  -model, --ml TEXT               An equation representing the linear
                                  regression. The dependent variable comes
                                  first, followed by "=" or "~", followed by
                                  the independent variables separated by "+"
                                  (Ex: -model "fs_003343 = age*sex + sex +
                                  age + group + age*group + bmi") [required]
  -contrast, --ctr TEXT           Parameter, if set, will return differences
                                  in variable relationships by group. One or
                                  multiple parameters can be used (separate
                                  with commas) (Ex: -contrast group,age)
  -r, --regularization TEXT       If set, applies L1 or L2 regularization
                                  and returns the maximum likelihood weight.
                                  Prevents overfitting. (Ex: -r L1)
  -o, --output_file TEXT          Optional output file (TXT) to store results
  --help                          Show this message and exit.

To use the linear regression algorithm successfully, structure, syntax, and querying is important. Here is how to maximize the usefulness of the tool:

First, use pynidm query to discover the variables to use. PyNIDM allows for the use of either data elements (PIQ_tca9ck), specific URLs (http://uri.interlex.org/ilx_0100400), or source variables (DX_GROUP).

An example of a potential query is:

pynidm query -nl /simple2_NIDM_examples/datasets.datalad.org/abide/RawDataBIDS/CMU_a/nidm.ttl,/simple2_NIDM_examples/datasets.datalad.org/abide/RawDataBIDS/CMU_b/nidm.ttl -u /projects?fields=fs_000008,DX_GROUP,PIQ_tca9ck,http://uri.interlex.org/ilx_0100400

You can also do:

pynidm query -nl /simple2_NIDM_examples/datasets.datalad.org/abide/RawDataBIDS/CMU_a/nidm.ttl,/Users/Ashu/Downloads/simple2_NIDM_examples/datasets.datalad.org/abide/RawDataBIDS/CMU_b/nidm.ttl -gf fs_000008,DX_GROUP,PIQ_tca9ck,http://uri.interlex.org/ilx_0100400

The query looks in the two files specified in the -nl parameter for the variables specified. In this case, we use fs_000008 and DX_GROUP (source variables), a URL (http://uri.interlex.org/ilx_0100400), and a data element (PIQ_tca9ck). The output of the file is slightly different depending on whether you use -gf or -u. With -gf, it will return the variables from both files separately, while -u combines them.

Now that we have selected the variables, we can perform a linear regression. In this example, we will look at the effect of DX_GROUP, age at scan, and PIQ on supratentorial brain volume.

The command to use for this particular data is:

pynidm linear-regression -nl /simple2_NIDM_examples/datasets.datalad.org/abide/RawDataBIDS/CMU_a/nidm.ttl,/simple2_NIDM_examples/datasets.datalad.org/abide/RawDataBIDS/CMU_b/nidm.ttl -model "fs_000008 = DX_GROUP + PIQ_tca9ck + http://uri.interlex.org/ilx_0100400" -contrast "DX_GROUP" -r L1

-nl specifies the file(s) to pull data from, while -model is the model to perform a linear regression model on. In this case, the variables are fs_000008 (the dependent variable, supratentorial brain volume), DX_GROUP (diagnostic group), PIQ_tca9ck (PIQ), and http://uri.interlex.org/ilx_0100400 (age at scan). The -contrast parameter says to contrast the data using DX_GROUP, and then do a L1 regularization to prevent overfitting.

Details on the REST API URI format and usage can be found below.

2 PyNIDM: REST API and Command Line Usage

2.1 Introduction

There are two main ways to interact with NIDM data using the PyNIDM REST API. First, the pynidm query command line utility will accept queries formatted as REST API URIs. Second, the rest-server.py script can be used to run a HTTP server to accept and process requests. This script can either be run directly or using a docker container defined in the docker directory of the project.

Example usage:

$ pynidm query -nl "cmu_a.ttl,cmu_b.ttl" -u /projects

dc1bf9be-10a3-11ea-8779-003ee1ce9545
ebe112da-10a3-11ea-af83-003ee1ce9545

2.2 Installation

To use the REST API query syntax on the command line, follow the PyNIDM installation instructions.

The simplest way to deploy a HTTP REST API server would be with the provided docker container. You can find instructions for that process in the README.md file in the docker directory of the Github repository.

2.3 URI formats

You can find details on the REST API at the SwaggerHub API Documentation. The OpenAPI specification file is part of the Github repository in ‘docs/REST_API_definition.openapi.yaml’

Here is a list of the current operations. See the SwaggerHub page for more details and return formats.

- /projects
- /projects/{project_id}
- /projects/{project_id}/subjects
- /projects/{project_id}/subjects?filter=[filter expression]
- /projects/{project_id}/subjects/{subject_id}
- /projects/{project_id}/subjects/{subject_id}/instruments/{instrument_id}
- /projects/{project_id}/subjects/{subject_id}/derivatives/{derivative_id}
- /statistics/projects/{project_id}

You can append the following query parameters to many of the operations:

- filter
- field

2.3.1 Operations

/projects

Get a list of all project IDs available.

Supported query parameters: none

/projects/{project_id}

See some details for a project. This will include the list of subject IDs and data elements used in the project

Supported query parameters: filter

/projects/{project_id}/subjects

Get the list of subjects in a project

Supported query parameters: filter

/projects/{project_id}/subjects/{subject_id}

Get the details for a particular subject. This will include the results of any instrumnts or derivatives associated with the subject, as well as a list of the related activities.

Supported query parameters: none

/projects/{project_id}/subjects/{subject_id}/instruments/{instrument_id}

Get the values for a particular instrument

Supported query parameters: none

/projects/{project_id}/subjects/{subject_id}/derivatives/{derivative_id}

Get the values for a particular derivative

Supported query parameters: none

/statistics/projects/{project_id}

See project statistics. You can also use this operation to get statsitcs on a particular instrument or derivative entry by use a field query option.

Supported query parameters: filter, field

/statistics/projects/{project_id}/subjects/{subject_id}

See some details for a project. This will include the list of subject IDs and data elements used in the project

Supported query parameters: none

2.3.2 Query Parameters

filter

The filter query parameter is used when you want to receive data only on subjects that match some criteria. The format for the filter value should be of the form:

identifier op value [ and identifier op value and ... ]

Identifiers should be formatted as “instrument.ID” or “derivatives.ID” You can use any value for the instrument ID that is shown for an instrument or in the data_elements section of the project details. For the derivative ID, you can use the last component of a derivative field URI (ex. for the URI http://purl.org/nidash/fsl#fsl_000007, the ID would be “fsl_000007”) or the exact label shown when viewing derivative data (ex. “Left-Caudate (mm^3)”).

The op can be one of “eq”, “gt”, “lt”.

Example filters:

?filter=instruments.AGE_AT_SCAN gt 30 ?filter=instrument.AGE_AT_SCAN eq 21 and derivative.fsl_000007 lt 3500

fields

The fields query parameter is used to specify what fields should be detailed in a statistics operation. For each field specified the result will show minimum, maximum, average, median, and standard deviation for the values of that field across all subjects matching the operation and filter. Multiple fields can be specified by separating each field with a comma.

Fields should be formatted in the same way as identifiers are specified in the filter parameter.

Example field query:

http://localhost:5000/statistics/projects/abc123?field=instruments.AGE_AT_SCAN,derivatives.fsl_000020

2.4 Return Formatting

By default the HTTP REST API server will return JSON formatted objects or arrays. When using the pynidm query command line utility the default return format is text (when possible) or you can use the -j option to have the output formatted as JSON.

2.4.1 Examples

2.4.1.1 Get the UUID for all the projects at this location
curl http://localhost:5000/projects

Example response:

[
    "dc1bf9be-10a3-11ea-8779-003ee1ce9545"
]
2.4.1.2 Get the project summary details
curl http://localhost:5000/projects/dc1bf9be-10a3-11ea-8779-003ee1ce9545

Example response:

{
 "http://www.w3.org/1999/02/22-rdf-syntax-ns#type": "http://purl.org/nidash/nidm#Project",
 "dctypes:title": "ABIDE CMU_a Site",
 "http://www.w3.org/ns/prov#Location": "/datasets.datalad.org/abide/RawDataBIDS/CMU_a",
 "sio:Identifier": "1.0.1",
 "nidm:NIDM_0000171": 14,
 "age_max": 33.0,
 "age_min": 21.0,
 "ndar:gender": [
     "1",
     "2"
 ],
 "obo:handedness": [
     "R",
     "L",
     "Ambi"
 ]
}
2.4.1.3 Get the subjects in a project
pynidm query -nl "cmu_a.nidm.ttl" -u http://localhost:5000/projects/dc1bf9be-10a3-11ea-8779-003ee1ce9545/subjects

Example response:

deef8eb2-10a3-11ea-8779-003ee1ce9545
df533e6c-10a3-11ea-8779-003ee1ce9545
ddbfb454-10a3-11ea-8779-003ee1ce9545
df21cada-10a3-11ea-8779-003ee1ce9545
dcfa35b2-10a3-11ea-8779-003ee1ce9545
de89ce4c-10a3-11ea-8779-003ee1ce9545
dd2ce75a-10a3-11ea-8779-003ee1ce9545
ddf21020-10a3-11ea-8779-003ee1ce9545
debc0f74-10a3-11ea-8779-003ee1ce9545
de245134-10a3-11ea-8779-003ee1ce9545
dd5f2f30-10a3-11ea-8779-003ee1ce9545
dd8d4faa-10a3-11ea-8779-003ee1ce9545
df87cbaa-10a3-11ea-8779-003ee1ce9545
de55285e-10a3-11ea-8779-003ee1ce9545
2.4.1.4 Use the command line to get statistics on a project for the AGE_AT_SCAN and a FSL data element
pynidm query -nl ttl/cmu_a.nidm.ttl -u /statistics/projects/dc1bf9be-10a3-11ea-8779-003ee1ce9545?fields=instruments.AGE_AT_SCAN,derivatives.fsl_000001

Example response:

-------------------------------------------------  ---------------------------------------------
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type"  http://www.w3.org/ns/prov#Activity
"title"                                            ABIDE CMU_a Site
"Identifier"                                       1.0.1
"prov:Location"                                    /datasets.datalad.org/abide/RawDataBIDS/CMU_a
"NIDM_0000171"                                     14
"age_max"                                          33.0
"age_min"                                          21.0

  gender
--------
       1
       2

handedness
------------
R
L
Ambi

subjects
------------------------------------
de89ce4c-10a3-11ea-8779-003ee1ce9545
deef8eb2-10a3-11ea-8779-003ee1ce9545
dd8d4faa-10a3-11ea-8779-003ee1ce9545
ddbfb454-10a3-11ea-8779-003ee1ce9545
de245134-10a3-11ea-8779-003ee1ce9545
debc0f74-10a3-11ea-8779-003ee1ce9545
dd5f2f30-10a3-11ea-8779-003ee1ce9545
ddf21020-10a3-11ea-8779-003ee1ce9545
dcfa35b2-10a3-11ea-8779-003ee1ce9545
df21cada-10a3-11ea-8779-003ee1ce9545
df533e6c-10a3-11ea-8779-003ee1ce9545
de55285e-10a3-11ea-8779-003ee1ce9545
df87cbaa-10a3-11ea-8779-003ee1ce9545
dd2ce75a-10a3-11ea-8779-003ee1ce9545

-----------  ------------------  --------
AGE_AT_SCAN  max                 33
AGE_AT_SCAN  min                 21
AGE_AT_SCAN  median              26
AGE_AT_SCAN  mean                26.2857
AGE_AT_SCAN  standard_deviation   4.14778
-----------  ------------------  --------

----------  ------------------  -----------
fsl_000001  max                 1.14899e+07
fsl_000001  min                 5.5193e+06
fsl_000001  median              7.66115e+06
fsl_000001  mean                8.97177e+06
fsl_000001  standard_deviation  2.22465e+06
----------  ------------------  -----------
2.4.1.5 Get details on a subject

Use -j for a JSON-formatted response

pynidm query -j -nl "cmu_a.nidm.ttl" -u http://localhost:5000/projects/dc1bf9be-10a3-11ea-8779-003ee1ce9545/subjects/df21cada-10a3-11ea-8779-003ee1ce9545

Example response:

 {
"uuid": "df21cada-10a3-11ea-8779-003ee1ce9545",
"id": "0050665",
"activity": [
  "e28dc764-10a3-11ea-a7d3-003ee1ce9545",
  "df28e95a-10a3-11ea-8779-003ee1ce9545",
  "df21c76a-10a3-11ea-8779-003ee1ce9545"
],
"instruments": {
  "e28dd218-10a3-11ea-a7d3-003ee1ce9545": {
    "SRS_VERSION": "nan",
    "ADOS_MODULE": "nan",
    "WISC_IV_VCI": "nan",
    "WISC_IV_PSI": "nan",
    "ADOS_GOTHAM_SOCAFFECT": "nan",
    "VINELAND_PLAY_V_SCALED": "nan",
    "null": "http://www.w3.org/ns/prov#Entity",
    "VINELAND_EXPRESSIVE_V_SCALED": "nan",
    "SCQ_TOTAL": "nan",
    "SRS_MOTIVATION": "nan",
    "PIQ": "104.0",
    "FIQ": "109.0",
    "WISC_IV_PRI": "nan",
    "FILE_ID": "CMU_a_0050665",
    "VIQ": "111.0",
    "WISC_IV_VOCAB_SCALED": "nan",
    "VINELAND_DAILYLVNG_STANDARD": "nan",
    "WISC_IV_SIM_SCALED": "nan",
    "WISC_IV_DIGIT_SPAN_SCALED": "nan",
    "AGE_AT_SCAN": "33.0"
    }
 },
"derivatives": {
    "b9fe0398-16cc-11ea-8729-003ee1ce9545": {
       "URI": "http://iri.nidash.org/b9fe0398-16cc-11ea-8729-003ee1ce9545",
       "values": {
         "http://purl.org/nidash/fsl#fsl_000005": {
           "datumType": "ilx_0102597",
           "label": "Left-Amygdala (voxels)",
           "value": "1573",
           "units": "voxel"
         },
         "http://purl.org/nidash/fsl#fsl_000004": {
           "datumType": "ilx_0738276",
           "label": "Left-Accumbens-area (mm^3)",
           "value": "466.0",
           "units": "mm^3"
         },
         "http://purl.org/nidash/fsl#fsl_000003": {
           "datumType": "ilx_0102597",
           "label": "Left-Accumbens-area (voxels)",
           "value": "466",
           "units": "voxel"
         }
       },
       "StatCollectionType": "FSLStatsCollection"
    }
 }

2.4.2 version

Print the installed PyNIDM version.

Usage: pynidm version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynidm-4.3.1.tar.gz (772.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pynidm-4.3.1-py3-none-any.whl (347.7 kB view details)

Uploaded Python 3

File details

Details for the file pynidm-4.3.1.tar.gz.

File metadata

  • Download URL: pynidm-4.3.1.tar.gz
  • Upload date:
  • Size: 772.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for pynidm-4.3.1.tar.gz
Algorithm Hash digest
SHA256 6ff65584b318fefc7107c97d5a59c0a40be2842d3716c72ae1bbba140980b7cc
MD5 4ea2bd430f1eb5737439c81704d41d90
BLAKE2b-256 4f1226a2ac82bf5998b0ae74a0577bbce233a68cc3f6c89485a188918b601e64

See more details on using hashes here.

File details

Details for the file pynidm-4.3.1-py3-none-any.whl.

File metadata

  • Download URL: pynidm-4.3.1-py3-none-any.whl
  • Upload date:
  • Size: 347.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for pynidm-4.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 75ad7ce6b6488469059e1387fc4ed53becac9f12e752964a07246d86dc529060
MD5 affc62c488d85cda9b20aed0c77225d0
BLAKE2b-256 0608864ff8d7c7d75f68221771c29234b80b66e5103c0a709e2f3d09624b04c4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page