Skip to main content

A collection of commonly used value sets

Project description

Common Value Sets

PyPI version LinkML Documentation OWL/RDF

A comprehensive collection of standardized enumerations and value sets for data science, bioinformatics, materials science, and beyond.

๐ŸŽฏ Why Common Value Sets?

Data standardization is hard. Every project reinvents the wheel with custom enums, inconsistent naming, and no semantic meaning.
Common Value Sets solves this by providing:

  • ๐Ÿ“š Rich, standardized enumerations โ€“ Pre-defined value sets across multiple domains
  • ๐Ÿงฌ Semantic meaning โ€“ Every value is linked to ontology terms (when possible)
  • ๐Ÿ Python-first convenience โ€“ Work with simple enums, get semantics for free
  • ๐ŸŒ Multi-language support โ€“ Generate JSON Schema, TypeScript, and more
  • ๐Ÿ”— Interoperability โ€“ Built on LinkML standards for maximum compatibility

๐Ÿ” A Simple Example

Different datasets often represent the same concept in incompatible ways:

  • M / F
  • male / female
  • 1 / 2

They all mean the same thing, but they donโ€™t interoperate.
With Common Value Sets, you can instead use a shared enum:

from valuesets.enums.core import SexEnum

s = SexEnum.MALE
print(s.value)            # "MALE"
print(s.get_meaning())    # "NCIT:C20197"
print(s.get_description())# "Male sex"

โšก Quick Start

For Python Developers

from valuesets.enums.bio.structural_biology import StructuralBiologyTechnique
from valuesets.enums.spatial.spatial_qualifiers import AnatomicalSide

# Rich enums with metadata and ontology mappings
technique = StructuralBiologyTechnique.CRYO_EM
print(technique.value)  # "CRYO_EM"
print(technique.get_description())  # "Cryo-electron microscopy"
print(technique.get_meaning())  # "CHMO:0002413" (Chemical Methods Ontology)
print(technique.get_annotations())  # {'resolution_range': '2-30 ร… typical', ...}

# Spatial relationships with BSPO mappings
side = AnatomicalSide.LEFT
print(side.get_meaning())  # "BSPO:0000000" (Biological Spatial Ontology)

# Look up enums by their ontology terms
found = AnatomicalSide.from_meaning("BSPO:0000000")  # Returns LEFT

For Data Scientists

from valuesets.enums.statistics import StatisticalTest, PValueThreshold
from valuesets.enums.data_science import DatasetSplitType, ModelType

# Standardized statistical tests with STATO ontology mappings
test = StatisticalTest.STUDENTS_T_TEST
print(test.get_meaning())  # "STATO:0000176"
print(test.get_description())  # "Student's t-test for comparing means"

# ML pipeline with standard splits
split = DatasetSplitType.TRAIN
model = ModelType.RANDOM_FOREST

# P-value thresholds with clear semantics
threshold = PValueThreshold.SIGNIFICANT
print(threshold.get_annotations())  # {'value': 0.05, 'symbol': '*'}

For Bioinformaticians

from valuesets.enums.bio.taxonomy import CommonOrganismTaxaEnum, BiologicalKingdom
from valuesets.enums.bio.cell_biology import CellCyclePhase, CellType

# Model organisms with NCBI Taxonomy IDs
human = CommonOrganismTaxaEnum.HUMAN
print(human.get_meaning())  # "NCBITaxon:9606"
print(human.get_description())  # "Homo sapiens (human)"

# Cell biology with CL and GO mappings
phase = CellCyclePhase.S_PHASE
print(phase.get_meaning())  # "GO:0000084"

neuron = CellType.NEURON
print(neuron.get_meaning())  # "CL:0000540"

# Get all organisms at a specific taxonomic level
mammals = [org for org in CommonOrganismTaxaEnum
           if 'MAMMALIA' in str(org)]

๐Ÿ—๏ธ Available Domains

Core Domains (Most Mature)

  • ๐Ÿงฌ Biology:
    • Structural Biology: Cryo-EM techniques, crystallization methods, detectors
    • Cell Biology: Cell types, cell cycle phases, organelles
    • Taxonomy: Model organisms (all with NCBI Taxonomy IDs)
  • ๐Ÿ“ Spatial: Anatomical directions, planes, relationships (BSPO mapped)
  • ๐Ÿ“Š Statistics: Statistical tests (STATO mapped), p-value thresholds

Expanding Domains

  • ๐Ÿงช Data Science: ML model types, dataset splits, metrics
  • โš—๏ธ Materials Science: Crystal structures, characterization methods
  • ๐Ÿฅ Clinical/Medical: Blood types (SNOMED), vital status
  • ๐ŸŒ Environmental: Exposure routes, pollutants
  • โšก Energy: Sources, storage methods, efficiency ratings

Coming Soon

  • ๐Ÿงญ Geography: Country codes (ISO), time zones, coordinate systems
  • โฐ Time: Temporal relationships, periods, frequencies
  • ๐Ÿ’ผ Academic: Publication types, research roles, funding sources
  • ๐Ÿญ Industrial: Manufacturing processes, quality standards

๐Ÿ”„ Multiple Use Cases

1. LinkML Standards (YAML schemas)

Use the raw LinkML schemas for data modeling, validation, and documentation:

# Direct schema usage
Person:
  attributes:
    vital_status:
      range: VitalStatusEnum  # ALIVE, DECEASED, UNKNOWN

2. Python Programming (Rich Enums)

Get Python enums with full IDE support, type checking, and semantic metadata:

# Type-safe enums with ontology mappings
status = VitalStatusEnum.ALIVE  
print(status.meaning)  # "NCIT:C37987"

3. "Stealth Semantics"

Write simple code, get semantic meaning automatically:

# Example: Different systems use different names for the same concept
from valuesets.enums.medical import BloodTypeEnum
from external_system import PatientBloodType  # Third-party enum

# Even though the enum values might be named differently:
# BloodTypeEnum.A_POSITIVE vs PatientBloodType.A_POS
# They map to the same SNOMED code: SNOMED:278149003

if blood_type.get_meaning() == patient_blood.get_meaning():
    # Semantic interoperability - works across different naming conventions
    process_compatible_blood_type()

# Or use the utility function
if same_meaning_as(blood_type, patient_blood):
    process_compatible_blood_type()

4. Multi-language Interoperability

Generate schemas and types for any language:

# Generate JSON Schema for web apps
gen-jsonschema schema.yaml

# Generate TypeScript definitions  
gen-typescript schema.yaml -t typescript

# Generate JSON-LD
gen-jsonld schema.yaml

5. Integration & Tooling

  • Excel/Google Sheets: Generate dropdown validation lists
  • Web forms: Auto-generate select options with descriptions
  • APIs: Standardized response codes and classifications
  • Databases: Consistent foreign key constraints

๐Ÿ› ๏ธ Advanced Features

Hierarchical Relationships

# Some enums support hierarchical is_a relationships
from valuesets.enums import ViralGenomeTypeEnum

# Baltimore classification with hierarchy
positive_rna = ViralGenomeTypeEnum.SSRNA_POSITIVE  # Group IV
# inherits from SSRNA (single-stranded RNA)

Rich Metadata

from valuesets.enums.bio.structural_biology import CryoEMGridType

grid = CryoEMGridType.QUANTIFOIL
metadata = grid.get_metadata()
print(metadata)
# {
#   'name': 'QUANTIFOIL',
#   'value': 'QUANTIFOIL',
#   'description': 'Quantifoil holey carbon grid',
#   'annotations': {
#     'hole_sizes': '1.2/1.3, 2/1, 2/2 ฮผm common',
#     'manufacturer': 'Quantifoil'
#   }
# }

# Get all grid types with their descriptions at once
all_grids = CryoEMGridType.get_all_descriptions()
# {'C_FLAT': 'C-flat holey carbon grid', 'QUANTIFOIL': ...}

Utility Functions

from valuesets.enums.spatial import AnatomicalPlane

# Get all ontology mappings for an enum
mappings = AnatomicalPlane.get_all_meanings()
print(mappings)
# {'SAGITTAL': 'BSPO:0000417', 'CORONAL': 'BSPO:0000019', ...}

# List all metadata for every value in an enum
all_metadata = AnatomicalPlane.list_metadata()
for name, meta in all_metadata.items():
    print(f"{name}: {meta.get('description', 'No description')}")

# Find enum by ontology term (useful for data integration)
plane = AnatomicalPlane.from_meaning("BSPO:0000417")  # Returns SAGITTAL

Dynamic Enums

Some enums in this collection are dynamic enums that can be expanded at runtime by querying ontologies. This uses LinkML's Dynamic Enum feature.

# Example: A dynamic enum that pulls values from an ontology
CellTypeEnum:
  # Dynamic expansion from Cell Ontology
  reachable_from:
    source_ontology: obo:cl
    source_nodes:
      - CL:0000540  # neuron
    include_self: false
    relationship_types:
      - rdfs:subClassOf

Note: Runtime expansion support is coming soon! Currently, dynamic enums provide:

  • โœ… Static values with ontology mappings
  • โœ… Metadata and descriptions
  • ๐Ÿšง Runtime expansion from ontologies (coming in next release)

When runtime expansion is available, you'll be able to:

# Future: Dynamically expand enum with all neuron subtypes
cell_types = CellTypeEnum.expand_from_ontology()
# Would add: MOTOR_NEURON, SENSORY_NEURON, INTERNEURON, etc.

๐Ÿ“– Documentation

Full Documentation Website โ†’

OWL/RDF Representation

The value sets are also available as an OWL ontology for semantic web applications and ontology browsers:

The OWL representation allows you to:

  • Browse value sets in ontology browsers
  • Perform SPARQL queries
  • Integrate with semantic web applications
  • Link to other biomedical ontologies

๐Ÿš€ Future Directions

Maturity Levels

We plan to add maturity level metadata to each enum to help users understand their readiness:

  • ๐ŸŸข Stable: Production-ready, well-tested, unlikely to change
  • ๐ŸŸก Beta: Usable but may have minor changes
  • ๐Ÿ”ด Draft: Under development, expect changes
# Future: Check maturity before use
if enum_def.maturity_level == MaturityLevel.STABLE:
    use_in_production()

Modularization

Split the package into domain-specific modules for lighter installs:

# Future: Install only what you need
pip install valuesets-core        # Core functionality
pip install valuesets-bio         # Biological domains
pip install valuesets-materials   # Materials science
pip install valuesets-clinical    # Clinical/medical

Community Extensions

  • Domain Packages: Community-maintained domain-specific value sets
  • Organization Standards: Company/institution-specific enums that extend base sets
  • Mapping Tables: Cross-ontology and cross-standard mappings

Advanced Features

  • ๐Ÿค– AI/LLM Integration: Semantic annotations optimized for language models
  • ๐Ÿ“Š Usage Analytics: Track which enums are most used, identify gaps
  • ๐Ÿ”„ Version Management: Handle enum evolution with deprecation warnings
  • ๐ŸŒ Multi-ontology Support: Map single values to multiple ontologies
  • ๐Ÿ” Fuzzy Matching: Find enums by approximate string matching

๐Ÿ—๏ธ Development

Installation

git clone https://github.com/linkml/valuesets
cd valuesets
uv install

Available Commands

just --list  # Show all available commands
just test    # Run tests  
just doctest # Run doctests
just lint    # Run linting
just site    # Build documentation site

๐Ÿค Contributing

We welcome contributions! Whether you're adding new domains, improving existing enums, or fixing bugs:

  1. Domain Experts: Contribute standardized value sets for your field
  2. Developers: Add utility functions, improve tooling, fix issues
  3. Users: Report missing enums, suggest improvements, share use cases

๐Ÿ“ Repository Structure

โ”œโ”€โ”€ src/valuesets/
โ”‚   โ”œโ”€โ”€ schema/              # ๐Ÿ“ LinkML YAML schemas (source of truth)
โ”‚   โ”‚   โ”œโ”€โ”€ bio/            # Biological domains
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ cell_biology.yaml
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ structural_biology.yaml
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ taxonomy.yaml
โ”‚   โ”‚   โ”œโ”€โ”€ spatial/        # Spatial and anatomical
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ spatial_qualifiers.yaml
โ”‚   โ”‚   โ”œโ”€โ”€ statistics.yaml
โ”‚   โ”‚   โ””โ”€โ”€ core.yaml
โ”‚   โ”œโ”€โ”€ enums/              # ๐Ÿ Generated Python enums
โ”‚   โ”‚   โ””โ”€โ”€ <auto-generated from schemas>
โ”‚   โ”œโ”€โ”€ generators/         # ๐Ÿ”ง Rich enum generator
โ”‚   โ”‚   โ””โ”€โ”€ rich_enum.py
โ”‚   โ””โ”€โ”€ validators/         # โœ“ Ontology validation
โ”‚       โ””โ”€โ”€ enum_evaluator.py
โ”œโ”€โ”€ docs/                   # ๐Ÿ“š Documentation
โ””โ”€โ”€ tests/                  # ๐Ÿงช Test cases
    โ”œโ”€โ”€ test_rich_enums.py  # Rich enum functionality
    โ””โ”€โ”€ validators/         # Ontology validation tests

๐Ÿ“œ Credits

Built with LinkML and the linkml-project-copier template.


Making data standardization simple, semantic, and scalable ๐Ÿš€

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valuesets-0.4.0.tar.gz (13.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

valuesets-0.4.0-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file valuesets-0.4.0.tar.gz.

File metadata

  • Download URL: valuesets-0.4.0.tar.gz
  • Upload date:
  • Size: 13.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for valuesets-0.4.0.tar.gz
Algorithm Hash digest
SHA256 e3928e48995c3bbc507e68392ab7a6588c4ccbb319b174c39f7d79af3c53f0d6
MD5 3fa639d1e62837c4186a2aa918e6a063
BLAKE2b-256 92e830b70cfba3af27c7c168dc0168575f51d559172db9ceb3e00532eba07ef1

See more details on using hashes here.

Provenance

The following attestation bundles were made for valuesets-0.4.0.tar.gz:

Publisher: pypi-publish.yaml on linkml/valuesets

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file valuesets-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: valuesets-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for valuesets-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b76f2cde37c7cbb168d38efb125110d10169a61886983371e79c5e4d517a5fa7
MD5 0c007a7e1ebf5b15a04958d5c851315e
BLAKE2b-256 62869a4985aeae65b34f345fbe0629f00d70b153f93b593569e11c82c4b488b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for valuesets-0.4.0-py3-none-any.whl:

Publisher: pypi-publish.yaml on linkml/valuesets

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page