Package to manage mapping of source data into aind-data-schema metadata files.
Project description
aind-metadata-mapper
Repository to contain code that will parse source files into aind-data-schema models.
Usage
The GatherMetadataJob is used to create the data_description.json and pull the subject.json and procedures.json from aind-metadata-service. Users are expected to provide the instrument.json and the acquisition.json as well as optional processing.json, quality_control.json and model.json. The job will attempt to validate all of the metadata files, displaying errors, and then will save all metadata fields into the selected folder.
Using the GatherMetadataJob
The following are the minimum required settings:
output_dir(str): Location where metadata files will be saved. If ametadata_diris not provided, this will also be the location that the job searches for metadata files.data_description_settings:project_name(str): Project name used to fetch funding and investigator information.modalities(List[Modality]): List of data modalities for this dataset.
from aind_data_schema_models.modalities import Modality
from aind_metadata_mapper.gather_metadata import GatherMetadataJob
from aind_metadata_mapper.models import JobSettings, DataDescriptionSettings
job_settings = JobSettings(
output_dir="/path/to/output",
subject_id="123456",
data_description_settings=DataDescriptionSettings(
project_name="<project-name>",
modalities=[Modality.ECEPHYS],
)
)
job = GatherMetadataJob(job_settings=job_settings)
job.run_job()
Default behavior
The GatherMetadataJob attempts to find all of the core metadata files (instrument.json, acquisition.json, etc) and then validates them as a full Metadata object.
The job will always prioritize an exact match for a core file when it finds one in the metadata_dir.
If no exact match exists, it will construct, fetch, merge or run mappers to generate the appropriate metadata, if it is available.
| File | Method 1 | Method 2 | Method 3 |
|---|---|---|---|
| data_description.json | Exact match in input directory | Construct from settings / fetch from metadata-service | |
| subject.json | Exact match in input directory | Fetch from metadata-service (requires subject_id) | |
| procedures.json | Exact match in input directory | Fetch from metadata-service (requires subject_id) | |
| acquisition.json | Exact match in input directory | Run mappers on <mapper>.json files (and merge) |
Merge all acquisition*.json files |
| instrument.json | Exact match in input directory | Fetch from metadata-service (requires instrument_id) | Merge all instrument*.json files |
| processing.json | Exact match in input directory | ||
| quality_control.json | Exact match in input directory | Merge all quality_control*.json files |
|
| model.json | Exact match in input directory |
Automated mappers
When mappers are developed from the BaseMapper class and registered in mapper_registry.py they can be automatically run by the GatherMetadataJob. A file matching the mapper name <mapper>.json will be turned into a file acquisition_<mapper>.json and then merged with any other acquisition files.
Optional settings
-
metadata_dir(str, optional): Location of existing metadata files, if different from theoutput_dir. If a file is found here, it will be used directly instead of constructing/fetching it. -
subject_id(str): Subject ID used to fetch metadata from the service (subject.json, procedures.json). This setting should only be used when anacquisition.jsonis not available. -
acquisition_start_time(datetime, optional): Acquisition start time in ISO 8601 format. This setting should only be used when anacquisition.jsonis not available. -
instrument_settings:instrument_id(str): ID for the instrument used in data collection. When set, the instrument.json will attempt to be fetched from the metadata-service and saved asinstrument_<modality-abbreviation(s)>.json. If multipleinstrument*.jsonfiles exist after fetching they will be merged.
-
data_description_settings: See DataDescription for details.tags(list[str], optional)group(str, optional)restrictions(str, optional)data_summary(str, optional)
from datetime import datetime
from aind_data_schema_models.modalities import Modality
from aind_data_schema_models.data_name_patterns import Group
from aind_metadata_mapper.gather_metadata import GatherMetadataJob
from aind_metadata_mapper.models import JobSettings, DataDescriptionSettings, InstrumentSettings
job_settings = JobSettings(
metadata_dir="/path/to/input/",
output_dir="/path/to/output",
subject_id="828422",
acquisition_start_time=datetime.fromisoformat("2025-11-13T17:38:37.079861+00:00"),
data_description_settings=DataDescriptionSettings(
project_name="Cognitive flexibility in patch foraging",
modalities=[Modality.BEHAVIOR, Modality.BEHAVIOR_VIDEOS, Modality.FIB],
tags=["foraging"],
group=Group.BEHAVIOR,
restrictions="Internal use only",
data_summary="VR foraging task with fiber photometry recording",
),
instrument_settings=InstrumentSettings(
instrument_id="13A",
),
raise_if_invalid=True,
raise_if_mapper_errors=True,
metadata_service_url="http://aind-metadata-service",
)
job = GatherMetadataJob(job_settings=job_settings)
job.run_job()
Validation settings
-
raise_if_invalid(bool, default=False): Controls validation behavior:True: Raises an exception if any fetched metadata is invalid.False: Logs a warning or error and continues when validation errors occur.
-
raise_if_mapper_errors(bool, default=True): Controls mapper execution behavior:True: Raises an error if any automated mapper (e.g., for instrument-specific formats) fails.False: Logs a warning and continues without that mapper's output.
Metadata service settings
You probably shouldn't be modifying these.
-
metadata_service_url(str, default=http://aind-metadata-service): Base URL of the metadata service. -
metadata_service_*_endpoint(str): API endpoints for specific metadata types:metadata_service_subject_endpoint(default="/api/v2/subject/")metadata_service_procedures_endpoint(default="/api/v2/procedures/")metadata_service_instrument_endpoint(default="/api/v2/instrument/")
Developing Mappers
Each MapperJob class should inherit from BaseMapper in base.py. The only parameter should be the MapperJobSettings from base.py. You cannot add additional parameters to your job or it will not be possible for it to be run automatically on the data-transfer-service. GatherMetadataJob will then run your mappers automatically when it detects the extracted metadata output.
Writing the output file
In your run_job() function the final step should be to use the write_standard_file() function and pass it the parameters from the job settings. This ensures that any changes we make to how writing files happens in the future will be preserved in your mapper.
acquisition.write_standard_file(output_directory=job_settings.output_directory, filename_suffix=filename_suffix)
Individual mappers
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aind_metadata_mapper-1.0.0.tar.gz.
File metadata
- Download URL: aind_metadata_mapper-1.0.0.tar.gz
- Upload date:
- Size: 48.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bd0b05abdc981338236e0d06a4df7c4e4885adebc6f6cf8268aa1f47c676a13
|
|
| MD5 |
b7946444c9252eb78ae9754c3201f2da
|
|
| BLAKE2b-256 |
d2cb1108cea7c4e083745746848be2390dca7c57111415543bb2c862379a9565
|
File details
Details for the file aind_metadata_mapper-1.0.0-py3-none-any.whl.
File metadata
- Download URL: aind_metadata_mapper-1.0.0-py3-none-any.whl
- Upload date:
- Size: 32.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a706dabc1e5b3a4752bdf5ede63723414d573fef2fc279cbea87e7da38071d64
|
|
| MD5 |
fe10fe03397f1141bfa6038e0968f615
|
|
| BLAKE2b-256 |
1dc9515bf61d07e905d2a0383ff86057bb47072c8418a9085ca4001732a8af64
|