Skip to main content

Generated from aind-library-template

Project description

aind-metadata-extractor

Extractors handle pulling metadata from acquisition data files. The output of an extractor is a data model (stored in the models/ subfolder) which is a contract with the corresponding mapper in aind-metadata-mapper.

Extractors need to be run on the rig immediately following acquisition.

Mappers are run by the GatherMetadataJob on the data-transfer-service.

Install

You should only install the dependencies for the specific extractor you plan to run. You can see the list of available extractors in the pyproject.toml file or in the folders in src/aind_metadata/extractor

During installation pass the extractor as an optional dependency:

pip install 'aind-metadata-extractor[<your-extractor>]'

Develop

To build a new extractor, define a new output model in the models/ folder. Then create a new extractor folder and inherit from BaseExtractor. Implement the functions:

  • .run_job() should store the metadata output object (matching the model) in self.metadata and return a dictionary with the model_dump() contents
  • ._extract() should perform the actual data loading, metadata-service calls, etc, necessary to build the metadata model and return it

Your extractor comes with an inherited function .write() which writes the metadata to the file .json.

Testing

When testing locally you only need to run your own tests (i.e. coverage run -m unittest discover -s tests/<new-extractor>). Do not modify the tests for other extractors in your PRs.

Before opening a PR, modify the file test_and_lint.yml and add a new test-group:

test-group: ['core', 'smartspim', 'mesoscope', 'utils', '<new-extractor>']

Then add the test-group settings below that:

    - test-group: '<new-extractor>'
    dependencies: '[dev,<new-extractor>]'
    test-path: 'tests/<new-extractor>'
    test-pattern: 'test_*.py'

When running on GitHub, all of the test groups will be run independently with their separate dependencies and then their coverage results are gathered together in a final step.

Run

Each extractor uses a JobSettings object to collect necessary information about data and metadata files to create an Extractor which is run by calling .extract(). For example, for smartspim:

from pathlib import Path

from aind_metadata_extractor.smartspim.job_settings import JobSettings
from aind_metadata_extractor.smartspim.extractor import SmartspimExtractor

DATA_DIR = Path("<path-to-your-data>)

job_settings=JobSettings(
    subject_id="786846",
    metadata_service_path="http://aind-metadata-service/slims/smartspim_imaging",
    input_source=DATA_DIR+"SmartSPIM_786846_2025-04-22_16-44-50",
    output_directory=".",
    slims_datetime="2025-0422T18:30:08.915000Z"
)
extractor = SmartspimExtractor(job_settings=job_settings)
extractor.run_job()
extractor.write()

The results will be saved in smartspin.json

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aind_metadata_extractor-0.3.3.tar.gz (51.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aind_metadata_extractor-0.3.3-py3-none-any.whl (58.6 kB view details)

Uploaded Python 3

File details

Details for the file aind_metadata_extractor-0.3.3.tar.gz.

File metadata

  • Download URL: aind_metadata_extractor-0.3.3.tar.gz
  • Upload date:
  • Size: 51.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for aind_metadata_extractor-0.3.3.tar.gz
Algorithm Hash digest
SHA256 5d9e5627b77569c05fb4c7b08c0fea28d55735570d8332ccf204d366d31c725b
MD5 358999ea0a14648b3efd10cc0ba19f4f
BLAKE2b-256 cc96fb72c105b0b2f919b62863ee7f70fcf4bfd1d447af3daa8b7a47262f701b

See more details on using hashes here.

File details

Details for the file aind_metadata_extractor-0.3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for aind_metadata_extractor-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a35a8043506a9a23478d3ea1c42ce2d71a94e5205ecea4bc7a8a71040d6ec2e8
MD5 0074e95c3fb3ee277fc2dd53f2337226
BLAKE2b-256 0d5a92ffeea94c3315eae1e5e717d853c6674ed71bb45a7e5f0e4f1736f1a793

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page