Skip to main content

A library that defines AIND data schema and validates JSON files.

Project description

aind-data-schema

License Code Style Documentation Status

A library that defines AIND data schema and validates JSON files.

User documentation available on readthedocs.

Overview

This repository contains the schemas needed to ingest and validate metadata that are essential to ensuring AIND data collection is completely reproducible. Our general approach is to semantically version core schema classes and include those version numbers in serialized metadata so that we can flexibly evolve the schemas over time without requiring difficult data migrations. In the future, we will provide a browsable list of these classes rendered to JSON Schema, including all historic versions.

Be aware that this package is still under heavy preliminary development. Expect breaking changes regularly, although we will communicate these through semantic versioning.

A simple example:

import datetime

from aind_data_schema.core.subject import BreedingInfo, Housing, Subject
from aind_data_schema_models.organizations import Organization
from aind_data_schema_models.species import Species

t = datetime.datetime(2022, 11, 22, 8, 43, 00)

s = Subject(
   species=Species.MUS_MUSCULUS,
   subject_id="12345",
   sex="Male",
   date_of_birth=t.date(),
   genotype="Emx1-IRES-Cre;Camk2a-tTA;Ai93(TITL-GCaMP6f)",
   housing=Housing(home_cage_enrichment=["Running wheel"], cage_id="123"),
   background_strain="C57BL/6J",
   source=Organization.AI,
   breeding_info=BreedingInfo(
         breeding_group="Emx1-IRES-Cre(ND)",
         maternal_id="546543",
         maternal_genotype="Emx1-IRES-Cre/wt; Camk2a-tTa/Camk2a-tTA",
         paternal_id="232323",
         paternal_genotype="Ai93(TITL-GCaMP6f)/wt",
   ),
)

s.write_standard_file() # writes subject.json
{
   "describedBy": "https://raw.githubusercontent.com/AllenNeuralDynamics/aind-data-schema/main/src/aind_data_schema/core/subject.py",
   "schema_version": "0.5.6",
   "subject_id": "12345",
   "sex": "Male",
   "date_of_birth": "2022-11-22",
   "genotype": "Emx1-IRES-Cre;Camk2a-tTA;Ai93(TITL-GCaMP6f)",
   "species": {
      "name": "Mus musculus",
      "abbreviation": null,
      "registry": {
         "name": "National Center for Biotechnology Information",
         "abbreviation": "NCBI"
      },
      "registry_identifier": "10090"
   },
   "alleles": [],
   "background_strain": "C57BL/6J",
   "breeding_info": {
      "breeding_group": "Emx1-IRES-Cre(ND)",
      "maternal_id": "546543",
      "maternal_genotype": "Emx1-IRES-Cre/wt; Camk2a-tTa/Camk2a-tTA",
      "paternal_id": "232323",
      "paternal_genotype": "Ai93(TITL-GCaMP6f)/wt"
   },
   "source": {
      "name": "Allen Institute",
      "abbreviation": "AI",
      "registry": {
         "name": "Research Organization Registry",
         "abbreviation": "ROR"
      },
      "registry_identifier": "03cpe7c52"
   },
   "rrid": null,
   "restrictions": null,
   "wellness_reports": [],
   "housing": {
      "cage_id": "123",
      "room_id": null,
      "light_cycle": null,
      "home_cage_enrichment": [
         "Running wheel"
      ],
      "cohoused_subjects": []
   },
   "notes": null
}

Installing and Upgrading

To install the latest version:

pip install aind-data-schema

Every merge to the main branch is automatically tagged with a new major/minor/patch version and uploaded to PyPI. To upgrade to the latest version:

pip install aind-data-schema --upgrade

Controlled Vocabularies

Controlled vocabularies and other enumerated lists are maintained in a separate repository: aind-data-schema-models. This allows us to specify these lists without changing aind-data-schema. Controlled vocabularies include lists of organizations, manufacturers, species, modalities, platforms, units, harp devices, and registries.

To upgrade to the latest data models version:

pip install aind-data-schema-models --upgrade
``

## Contributing

To develop the code, check out this repo and run the following in the cloned directory: 

pip install -e .[dev]


If you've found a bug in the schemas or would like to make a minor change, open an [Issue](https://github.com/AllenNeuralDynamics/aind-data-schema/issues) on this repository. If you'd like to propose a large change or addition, or generally have a question about how things work, head start a new [Discussion](https://github.com/AllenNeuralDynamics/aind-data-schema/discussions)!


### Linters and testing

There are several libraries used to run linters, check documentation, and run tests.

- To run tests locally, navigate to AIND-DATA-SCHEMA directory in terminal and run (this will not run any on-line only tests):

python -m unittest


- Please test your changes using the **coverage** library, which will run the tests and log a coverage report:

coverage run -m unittest discover && coverage report


- To test any of the following modules, conda/pip install the relevant package (interrogate, flake8, black, isort), navigate to relevant directory, and run any of the following commands in place of [command]:

[command] -v .


- Use **interrogate** to check that modules, methods, etc. have been documented thoroughly:

interrogate .


- Use **flake8** to check that code is up to standards (no unused imports, etc.):

flake8 .


- Use **black** to automatically format the code into PEP standards:

black .


- Use **isort** to automatically sort import statements:

isort .


### Pull requests

For internal members, please create a branch. For external members, please fork the repo and open a pull request from the fork. We'll primarily use [Angular](https://github.com/angular/angular/blob/main/CONTRIBUTING.md#commit) style for commit messages. Roughly, they should follow the pattern:

():


where scope (optional) describes the packages affected by the code changes and type (mandatory) is one of:

- **build**: Changes that affect the build system or external dependencies (example scopes: pyproject.toml, setup.py)
- **ci**: Changes to our CI configuration files and scripts (examples: .github/workflows/ci.yml)
- **docs**: Documentation only changes
- **feat**: A new feature
- **fix**: A bug fix
- **perf**: A code change that improves performance
- **refactor**: A code change that neither fixes a bug nor adds a feature
- **test**: Adding missing tests or correcting existing tests

### Documentation

To generate the rst files source files for documentation, run:

sphinx-apidoc -o docs/source/ src


Then to create the documentation html files, run:

sphinx-build -b html docs/source/ docs/build/html


More info on sphinx installation can be found here: https://www.sphinx-doc.org/en/master/usage/installation.html

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aind_data_schema-0.35.2.tar.gz (298.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aind_data_schema-0.35.2-py3-none-any.whl (52.6 kB view details)

Uploaded Python 3

File details

Details for the file aind_data_schema-0.35.2.tar.gz.

File metadata

  • Download URL: aind_data_schema-0.35.2.tar.gz
  • Upload date:
  • Size: 298.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.19

File hashes

Hashes for aind_data_schema-0.35.2.tar.gz
Algorithm Hash digest
SHA256 ad6c164b0b7ccc12a4d09d61d2103df0772d0781cb75c90c88b39cc4248dcc69
MD5 0db83ef2378da56557ef40f65989261c
BLAKE2b-256 76e759db0c564bff63fb35a62756f48d37c3c47add6ee54667e372d8a8dcbc09

See more details on using hashes here.

File details

Details for the file aind_data_schema-0.35.2-py3-none-any.whl.

File metadata

File hashes

Hashes for aind_data_schema-0.35.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6e0d096fba539ce48ad295b33c720badeb4c3d27f1cfb0ed054581bf5f773feb
MD5 49abcd28889d446526c7c1c7c28b68d5
BLAKE2b-256 b5e9055da7a120212e104a3f558ed6eb43e2f0a9acf4436336b7306d6146a504

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page