Skip to main content

Generated from aind-library-template

Project description

aind-metadata-extractor

Extractors handle pulling metadata from acquisition data files. The output of an extractor is a data model (stored in the models/ subfolder) which is a contract with the corresponding mapper in aind-metadata-mapper.

Extractors need to be run on the rig immediately following acquisition.

Mappers are run automatically by the GatherMetadataJob on the data-transfer-service.

Install

You should only install the dependencies for the specific extractor you plan to run. You can see the list of available extractors in the pyproject.toml file or in the folders in src/aind_metadata/extractor

During installation pass the extractor as an optional dependency:

pip install 'aind-metadata-extractor[<your-extractor>]'

Run

Each extractor uses a JobSettings object to collect necessary information about data and metadata files to create an Extractor which is run by calling .extract(). For example, for smartspim:

from pathlib import Path

from aind_metadata_extractor.smartspim.job_settings import JobSettings
from aind_metadata_extractor.smartspim.extractor import SmartspimExtractor

DATA_DIR = Path("<path-to-your-data>)

job_settings=JobSettings(
    subject_id="786846",
    metadata_service_path="http://aind-metadata-service/slims/smartspim_imaging",
    input_source=DATA_DIR+"SmartSPIM_786846_2025-04-22_16-44-50",
    output_directory=".",
    slims_datetime="2025-0422T18:30:08.915000Z"
)
extractor = SmartspimExtractor(job_settings=job_settings)
extractor.run_job()
extractor.write()

The results will be saved in smartspim.json

Why

Every data acquisition is required to capture Acquisition metadata. In many situations this requires accessing the raw data files, which can mean installing custom rig-specific libraries. To maintain a clean separation of logic we are putting all rig-specific code into the extractors in this repository and keeping any code related to transforming to aind-data-schema in the mapper. In between the extractor and the mapper there is a contract, a pydantic model that contains all of the necessary information to run the mapper.

This pattern also allows us to keep any code that access metadata services (e.g. aind-metadata-service) off of the rigs.

Finally, this separation means that your mappers can be run automatically! You can find more details about mappers in the aind-metadata-mapper repository.

Develop

The only requirement for extractors is that you output a file <your-extractor-name>.json which validates against the corresponding model in the models/ subfolder.

Define a model

Define a new contract model in the models/ folder. Your model class should inherit from pydantic.BaseModel. You can nest sub-models if you find it helpful for organizing your metadata, see models/smartspim.py as an example.

Define extractor code

You do not need to keep your extractor code in this repository, but if you do put it here it will make it easier for us to coordinate updates with you in the future as metadata requirements evolve.

Option 1: Extractor code maintained elsewhere

Have your extractor code (in your acquisition code) output a file named <your-extractor-name>.json that is validated against your model. The intermediate model file should be stored alongside any other metadata files you are providing (usually the instrument.json, at a minimum).

Option 2: Extractor code in aind-metadata-extractor

Create a new extractor folder with a matching name and inherit from BaseExtractor. Implement the functions:

  • .run_job() accepts a JobSettings object as a parameter and should store the metadata output object (matching the model) in self.metadata. Return a dictionary with the model_dump() contents.
  • ._extract() should perform the actual data loading, metadata-service calls, etc, necessary to build the metadata model and return it. This function should return the actual model, validated against what is in the models/ folder.

Extractor classes inherit the .write() function, which writes the metadata to the file .json. Users will then be able to run your extractor according to the instructions in the run block, above.

Testing

When testing locally you only need to run your own tests (i.e. coverage run -m unittest discover -s tests/<new-extractor>). Do not modify the tests for other extractors in your PRs.

Before opening a PR, modify the file test_and_lint.yml and add a new test-group:

test-group: ['core', 'smartspim', 'mesoscope', 'utils', '<new-extractor>']

Then add the test-group settings below that:

    - test-group: '<new-extractor>'
    dependencies: '[dev,<new-extractor>]'
    test-path: 'tests/<new-extractor>'
    test-pattern: 'test_*.py'

When running on GitHub, all of the test groups will be run independently with their separate dependencies and then their coverage results are gathered together in a final step.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aind_metadata_extractor-0.3.5.tar.gz (54.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aind_metadata_extractor-0.3.5-py3-none-any.whl (60.8 kB view details)

Uploaded Python 3

File details

Details for the file aind_metadata_extractor-0.3.5.tar.gz.

File metadata

  • Download URL: aind_metadata_extractor-0.3.5.tar.gz
  • Upload date:
  • Size: 54.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aind_metadata_extractor-0.3.5.tar.gz
Algorithm Hash digest
SHA256 e3e24eb416eb380e8bb453e61576ba76e532e3ba8aa2a1b94b48b1cdd2b4f9c9
MD5 a742b850427fca21628f067b4408a1ac
BLAKE2b-256 fd10ed3816e290c10f7629108f714e5689c19dad34da9e874e6bf8a1032703b3

See more details on using hashes here.

File details

Details for the file aind_metadata_extractor-0.3.5-py3-none-any.whl.

File metadata

File hashes

Hashes for aind_metadata_extractor-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 7de7e1013beae5c87bef1c145e3ef9ef6f3b470c21708acbac743f31a2ba7439
MD5 1301ba5d40f705c8c9417f6ae56ff5d5
BLAKE2b-256 3666606671c8f5c6b15f7c711675365382e68b39306826437e04aea891a392f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page