Skip to main content

Package to manage mapping of source data into aind-data-schema metadata files.

Project description

aind-metadata-mapper

License Code Style semantic-release: angular Interrogate Coverage Python

Repository to contain code that will parse source files into aind-data-schema models.

Usage

The GatherMetadataJob is used to create the data_description.json and pull the subject.json and procedures.json from aind-metadata-service. Users are expected to provide the instrument.json and the acquisition.json as well as optional processing.json, quality_control.json and model.json. If a user provides procedures.json, it will be merged with the procedures fetched from the service (subject and specimen procedures are deduplicated). The job will attempt to validate all of the metadata files, displaying errors, and then will save all metadata fields into the selected folder.

Using the GatherMetadataJob

The following are the minimum required settings:

  • output_dir (str): Location where metadata files will be saved. If a metadata_dir is not provided, this will also be the location that the job searches for metadata files.
  • data_description_settings:
    • project_name (str): Project name used to fetch funding and investigator information.
    • modalities (List[Modality]): List of data modalities for this dataset.
from aind_data_schema_models.modalities import Modality
from aind_metadata_mapper.gather_metadata import GatherMetadataJob
from aind_metadata_mapper.models import JobSettings, DataDescriptionSettings

job_settings = JobSettings(
  output_dir="/path/to/output",
  subject_id="123456",
  data_description_settings=DataDescriptionSettings(
    project_name="<project-name>",
    modalities=[Modality.ECEPHYS],
  )
)

job = GatherMetadataJob(job_settings=job_settings)
job.run_job()

Default behavior

The GatherMetadataJob attempts to find all of the core metadata files (instrument.json, acquisition.json, etc) and then validates them as a full Metadata object.

The job will always prioritize an exact match for a core file when it finds one in the metadata_dir.

If no exact match exists, it will construct, fetch, merge or run mappers to generate the appropriate metadata, if it is available.

File Method 1 Method 2 Method 3
data_description.json Exact match in input directory Construct from settings / fetch from metadata-service
subject.json Exact match in input directory Fetch from metadata-service (requires subject_id) Constructed locally when subject_id is "calibration"
procedures.json Fetch from metadata-service (requires subject_id) Merge with user-provided procedures but default to user-provided if there are any duplicates Constructed locally (empty) when subject_id is "calibration"
acquisition.json Exact match in input directory Run mappers on <mapper>.json files (and merge) Merge all acquisition*.json files
instrument.json Exact match in input directory Fetch from metadata-service (requires instrument_id) Merge all instrument*.json files
processing.json Exact match in input directory
quality_control.json Exact match in input directory Merge all quality_control*.json files
model.json Exact match in input directory

Automated mappers

When mappers are developed from the BaseMapper class and registered in mapper_registry.py they can be automatically run by the GatherMetadataJob. A file matching the mapper name <mapper>.json will be turned into a file acquisition_<mapper>.json and then merged with any other acquisition files.

Optional settings

  • metadata_dir (str, optional): Location of existing metadata files, if different from the output_dir. If a file is found here, it will be used directly instead of constructing/fetching it.

  • subject_id (str): Subject ID used to fetch metadata from the service (subject.json, procedures.json). This setting should only be used when an acquisition.json is not available.

  • acquisition_start_time (datetime, optional): Acquisition start time in ISO 8601 format. This setting should only be used when an acquisition.json is not available.

  • subject_settings (optional): Settings for subject metadata. Only used when subject_id is "calibration".

    • calibration_object (CalibrationObject, optional): A CalibrationObject from aind_data_schema.components.subjects. When subject_id is "calibration", the metadata service is not contacted — instead a Subject is constructed locally using this object and an empty Procedures (no subject or specimen procedures). If omitted, a default empty CalibrationObject is used.
  • instrument_settings:

    • instrument_id (str): ID for the instrument used in data collection. When set, the instrument.json will attempt to be fetched from the metadata-service and saved as instrument_<modality-abbreviation(s)>.json. If multiple instrument*.json files exist after fetching they will be merged.
  • data_description_settings: See DataDescription for details.

    • tags (list[str], optional)
    • group (str, optional)
    • restrictions (str, optional)
    • data_summary (str, optional)
from datetime import datetime
from aind_data_schema_models.modalities import Modality
from aind_data_schema_models.data_name_patterns import Group
from aind_metadata_mapper.gather_metadata import GatherMetadataJob
from aind_metadata_mapper.models import JobSettings, DataDescriptionSettings, InstrumentSettings

job_settings = JobSettings(
    metadata_dir="/path/to/input/",
    output_dir="/path/to/output",
    subject_id="828422",
    acquisition_start_time=datetime.fromisoformat("2025-11-13T17:38:37.079861+00:00"),
    data_description_settings=DataDescriptionSettings(
        project_name="Cognitive flexibility in patch foraging",
        modalities=[Modality.BEHAVIOR, Modality.BEHAVIOR_VIDEOS, Modality.FIB],
        tags=["foraging"],
        group=Group.BEHAVIOR,
        restrictions="Internal use only",
        data_summary="VR foraging task with fiber photometry recording",
    ),
    instrument_settings=InstrumentSettings(
        instrument_id="13A",
    ),
    raise_if_invalid=True,
    raise_if_mapper_errors=True,
    metadata_service_url="http://aind-metadata-service",
)

job = GatherMetadataJob(job_settings=job_settings)
job.run_job()

Calibration sessions

When collecting data with a calibration object rather than a live subject, set subject_id to "calibration". The job will skip the metadata service entirely and construct subject.json and procedures.json locally.

from aind_data_schema.components.subjects import CalibrationObject
from aind_data_schema_models.modalities import Modality
from aind_metadata_mapper.gather_metadata import GatherMetadataJob
from aind_metadata_mapper.models import JobSettings, DataDescriptionSettings, SubjectSettings

job_settings = JobSettings(
    output_dir="/path/to/output",
    subject_id="calibration",
    data_description_settings=DataDescriptionSettings(
        project_name="<project-name>",
        modalities=[Modality.ECEPHYS],
    ),
    subject_settings=SubjectSettings(
        calibration_object=CalibrationObject(
            description="Neuropixels dummy probe",
            empty=False,
        )
    ),
)

job = GatherMetadataJob(job_settings=job_settings)
job.run_job()

Validation settings

  • raise_if_invalid (bool, default=False): Controls validation behavior:

    • True: Raises an exception if any fetched metadata is invalid.
    • False: Logs a warning or error and continues when validation errors occur.
  • raise_if_mapper_errors (bool, default=True): Controls mapper execution behavior:

    • True: Raises an error if any automated mapper (e.g., for instrument-specific formats) fails.
    • False: Logs a warning and continues without that mapper's output.

Metadata service settings

You probably shouldn't be modifying these.

  • metadata_service_url (str, default=http://aind-metadata-service): Base URL of the metadata service.

  • metadata_service_*_endpoint (str): API endpoints for specific metadata types:

    • metadata_service_subject_endpoint (default="/api/v2/subject/")
    • metadata_service_procedures_endpoint (default="/api/v2/procedures/")
    • metadata_service_instrument_endpoint (default="/api/v2/instrument/")

Instrument CLI

The aind-instrument command lets you upload and retrieve instruments from the metadata service without writing code.

# Upload an instrument
aind-instrument upload instrument.json

# Upload and overwrite an existing record
aind-instrument upload instrument.json --replace

# Keep the modification date from the file instead of updating to today
aind-instrument upload instrument.json --no-update-modification-date

# Get the latest record for an instrument
aind-instrument get 422_MESO2_20241017

# Get a specific version by modification date
aind-instrument get 422_MESO2_20241017 --modification-date 2024-10-28

# Save to a file instead of printing to stdout
aind-instrument get 422_MESO2_20241017 --output-directory ./output

Developing Mappers

Each MapperJob class should inherit from BaseMapper in base.py. The only parameter should be the MapperJobSettings from base.py. You cannot add additional parameters to your job or it will not be possible for it to be run automatically on the data-transfer-service. GatherMetadataJob will then run your mappers automatically when it detects the extracted metadata output.

Writing the output file

In your run_job() function the final step should be to use the write_standard_file() function and pass it the parameters from the job settings. This ensures that any changes we make to how writing files happens in the future will be preserved in your mapper.

acquisition.write_standard_file(output_directory=job_settings.output_directory, filename_suffix=filename_suffix)

Individual mappers

FIP (Fiber photometry)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aind_metadata_mapper-1.3.0.tar.gz (54.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aind_metadata_mapper-1.3.0-py3-none-any.whl (36.7 kB view details)

Uploaded Python 3

File details

Details for the file aind_metadata_mapper-1.3.0.tar.gz.

File metadata

  • Download URL: aind_metadata_mapper-1.3.0.tar.gz
  • Upload date:
  • Size: 54.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for aind_metadata_mapper-1.3.0.tar.gz
Algorithm Hash digest
SHA256 f07c91f64b4cf6e44dcf77a73cceea4dfd47ac4a3e4f2d1e6741b6ed4bed1c2c
MD5 57b8e7372d589126028057375b204c87
BLAKE2b-256 c6d83cb243576866bd973cf9c3fb789cf369b8092be828a3aa7f1d44d497818b

See more details on using hashes here.

File details

Details for the file aind_metadata_mapper-1.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for aind_metadata_mapper-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d3e13a5b1023ab61f16ff1050a5869c3405bf711037a650e0f73b41312b77f75
MD5 9628e77f853f0f84c48c8ddb0f514ae4
BLAKE2b-256 817446d507cc01983dd10229820d75f671b512c223313399d623a1f03c4059a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page