Skip to main content

Package to manage mapping of source data into aind-data-schema metadata files.

Project description

aind-metadata-mapper

License Code Style semantic-release: angular Interrogate Coverage Python

Repository to contain code that will parse source files into aind-data-schema models.

Usage

The GatherMetadataJob is used to create the data_description.json and pull the subject.json and procedures.json from aind-metadata-service. Users are expected to provide the instrument.json and the acquisition.json as well as optional processing.json, quality_control.json and model.json. The job will attempt to validate all of the metadata files, displaying errors, and then will save all metadata fields into the selected folder.

Using the GatherMetadataJob

The following are the minimum required settings:

  • output_dir (str): Location where metadata files will be saved. If a metadata_dir is not provided, this will also be the location that the job searches for metadata files.
  • data_description_settings:
    • project_name (str): Project name used to fetch funding and investigator information.
    • modalities (List[Modality]): List of data modalities for this dataset.
from aind_data_schema_models.modalities import Modality
from aind_metadata_mapper.gather_metadata import GatherMetadataJob
from aind_metadata_mapper.models import JobSettings, DataDescriptionSettings

job_settings = JobSettings(
  output_dir="/path/to/output",
  subject_id="123456",
  data_description_settings=DataDescriptionSettings(
    project_name="<project-name>",
    modalities=[Modality.ECEPHYS],
  )
)

job = GatherMetadataJob(job_settings=job_settings)
job.run_job()

Default behavior

The GatherMetadataJob attempts to find all of the core metadata files (instrument.json, acquisition.json, etc) and then validates them as a full Metadata object.

The job will always prioritize an exact match for a core file when it finds one in the metadata_dir.

If no exact match exists, it will construct, fetch, merge or run mappers to generate the appropriate metadata, if it is available.

File Method 1 Method 2 Method 3
data_description.json Exact match in input directory Construct from settings / fetch from metadata-service
subject.json Exact match in input directory Fetch from metadata-service (requires subject_id)
procedures.json Exact match in input directory Fetch from metadata-service (requires subject_id)
acquisition.json Exact match in input directory Run mappers on <mapper>.json files (and merge) Merge all acquisition*.json files
instrument.json Exact match in input directory Fetch from metadata-service (requires instrument_id) Merge all instrument*.json files
processing.json Exact match in input directory
quality_control.json Exact match in input directory Merge all quality_control*.json files
model.json Exact match in input directory

Automated mappers

When mappers are developed from the BaseMapper class and registered in mapper_registry.py they can be automatically run by the GatherMetadataJob. A file matching the mapper name <mapper>.json will be turned into a file acquisition_<mapper>.json and then merged with any other acquisition files.

Optional settings

  • metadata_dir (str, optional): Location of existing metadata files, if different from the output_dir. If a file is found here, it will be used directly instead of constructing/fetching it.

  • subject_id (str): Subject ID used to fetch metadata from the service (subject.json, procedures.json). This setting should only be used when an acquisition.json is not available.

  • acquisition_start_time (datetime, optional): Acquisition start time in ISO 8601 format. This setting should only be used when an acquisition.json is not available.

  • instrument_settings:

    • instrument_id (str): ID for the instrument used in data collection. When set, the instrument.json will attempt to be fetched from the metadata-service and saved as instrument_<modality-abbreviation(s)>.json. If multiple instrument*.json files exist after fetching they will be merged.
  • data_description_settings: See DataDescription for details.

    • tags (list[str], optional)
    • group (str, optional)
    • restrictions (str, optional)
    • data_summary (str, optional)
from datetime import datetime
from aind_data_schema_models.modalities import Modality
from aind_data_schema_models.data_name_patterns import Group
from aind_metadata_mapper.gather_metadata import GatherMetadataJob
from aind_metadata_mapper.models import JobSettings, DataDescriptionSettings, InstrumentSettings

job_settings = JobSettings(
    metadata_dir="/path/to/input/",
    output_dir="/path/to/output",
    subject_id="828422",
    acquisition_start_time=datetime.fromisoformat("2025-11-13T17:38:37.079861+00:00"),
    data_description_settings=DataDescriptionSettings(
        project_name="Cognitive flexibility in patch foraging",
        modalities=[Modality.BEHAVIOR, Modality.BEHAVIOR_VIDEOS, Modality.FIB],
        tags=["foraging"],
        group=Group.BEHAVIOR,
        restrictions="Internal use only",
        data_summary="VR foraging task with fiber photometry recording",
    ),
    instrument_settings=InstrumentSettings(
        instrument_id="13A",
    ),
    raise_if_invalid=True,
    raise_if_mapper_errors=True,
    metadata_service_url="http://aind-metadata-service",
)

job = GatherMetadataJob(job_settings=job_settings)
job.run_job()

Validation settings

  • raise_if_invalid (bool, default=False): Controls validation behavior:

    • True: Raises an exception if any fetched metadata is invalid.
    • False: Logs a warning or error and continues when validation errors occur.
  • raise_if_mapper_errors (bool, default=True): Controls mapper execution behavior:

    • True: Raises an error if any automated mapper (e.g., for instrument-specific formats) fails.
    • False: Logs a warning and continues without that mapper's output.

Metadata service settings

You probably shouldn't be modifying these.

  • metadata_service_url (str, default=http://aind-metadata-service): Base URL of the metadata service.

  • metadata_service_*_endpoint (str): API endpoints for specific metadata types:

    • metadata_service_subject_endpoint (default="/api/v2/subject/")
    • metadata_service_procedures_endpoint (default="/api/v2/procedures/")
    • metadata_service_instrument_endpoint (default="/api/v2/instrument/")

Developing Mappers

Each MapperJob class should inherit from BaseMapper in base.py. The only parameter should be the MapperJobSettings from base.py. You cannot add additional parameters to your job or it will not be possible for it to be run automatically on the data-transfer-service. GatherMetadataJob will then run your mappers automatically when it detects the extracted metadata output.

Writing the output file

In your run_job() function the final step should be to use the write_standard_file() function and pass it the parameters from the job settings. This ensures that any changes we make to how writing files happens in the future will be preserved in your mapper.

acquisition.write_standard_file(output_directory=job_settings.output_directory, filename_suffix=filename_suffix)

Individual mappers

FIP (Fiber photometry)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aind_metadata_mapper-1.1.0.tar.gz (48.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aind_metadata_mapper-1.1.0-py3-none-any.whl (32.8 kB view details)

Uploaded Python 3

File details

Details for the file aind_metadata_mapper-1.1.0.tar.gz.

File metadata

  • Download URL: aind_metadata_mapper-1.1.0.tar.gz
  • Upload date:
  • Size: 48.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for aind_metadata_mapper-1.1.0.tar.gz
Algorithm Hash digest
SHA256 95537ddefb15e404d0307d99ad9ed2905ec70238eaf0dd72b119f8f8107a57db
MD5 0189d5ade8ab362564b63ebf8249da71
BLAKE2b-256 4c401282d069d2d4d7315506bec51cbeb9dd7a815236db43988442a041cd3c11

See more details on using hashes here.

File details

Details for the file aind_metadata_mapper-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for aind_metadata_mapper-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 43bb65ffa15cda45db5eb79d83fef09dccf359ed843db0cd6ae14e5772cf3a52
MD5 b87749c1d7b10afb93135fa0c2d5e669
BLAKE2b-256 6820df79cd6f4bdec6fb23e89804791948d8bafdc228508347744c3fcabfc010

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page