Skip to main content

A collection of commonly used value sets

Project description

Common Value Sets

PyPI version LinkML Documentation OWL/RDF

A comprehensive collection of standardized enumerations and value sets for data science, bioinformatics, materials science, and beyond.

๐ŸŽฏ Why Common Value Sets?

Data standardization is hard. Every project reinvents the wheel with custom enums, inconsistent naming, and no semantic meaning.
Common Value Sets solves this by providing:

  • ๐Ÿ“š Rich, standardized enumerations โ€“ Pre-defined value sets across multiple domains
  • ๐Ÿงฌ Semantic meaning โ€“ Every value is linked to ontology terms (when possible)
  • ๐Ÿ Python-first convenience โ€“ Work with simple enums, get semantics for free
  • ๐ŸŒ Multi-language support โ€“ Generate JSON Schema, TypeScript, and more
  • ๐Ÿ”— Interoperability โ€“ Built on LinkML standards for maximum compatibility

๐Ÿ” A Simple Example

Different datasets often represent the same concept in incompatible ways:

  • M / F
  • male / female
  • 1 / 2

They all mean the same thing, but they donโ€™t interoperate.
With Common Value Sets, you can instead use a shared enum:

from valuesets.enums.core import SexEnum

s = SexEnum.MALE
print(s.value)            # "MALE"
print(s.get_meaning())    # "NCIT:C20197"
print(s.get_description())# "Male sex"

โšก Quick Start

For Python Developers

from valuesets.enums.bio.structural_biology import StructuralBiologyTechnique
from valuesets.enums.spatial.spatial_qualifiers import AnatomicalSide

# Rich enums with metadata and ontology mappings
technique = StructuralBiologyTechnique.CRYO_EM
print(technique.value)  # "CRYO_EM"
print(technique.get_description())  # "Cryo-electron microscopy"
print(technique.get_meaning())  # "CHMO:0002413" (Chemical Methods Ontology)
print(technique.get_annotations())  # {'resolution_range': '2-30 ร… typical', ...}

# Spatial relationships with BSPO mappings
side = AnatomicalSide.LEFT
print(side.get_meaning())  # "BSPO:0000000" (Biological Spatial Ontology)

# Look up enums by their ontology terms
found = AnatomicalSide.from_meaning("BSPO:0000000")  # Returns LEFT

For Data Scientists

from valuesets.enums.statistics import StatisticalTest, PValueThreshold
from valuesets.enums.data_science import DatasetSplitType, ModelType

# Standardized statistical tests with STATO ontology mappings
test = StatisticalTest.STUDENTS_T_TEST
print(test.get_meaning())  # "STATO:0000176"
print(test.get_description())  # "Student's t-test for comparing means"

# ML pipeline with standard splits
split = DatasetSplitType.TRAIN
model = ModelType.RANDOM_FOREST

# P-value thresholds with clear semantics
threshold = PValueThreshold.SIGNIFICANT
print(threshold.get_annotations())  # {'value': 0.05, 'symbol': '*'}

For Bioinformaticians

from valuesets.enums.bio.taxonomy import CommonOrganismTaxaEnum, BiologicalKingdom
from valuesets.enums.bio.cell_biology import CellCyclePhase, CellType

# Model organisms with NCBI Taxonomy IDs
human = CommonOrganismTaxaEnum.HUMAN
print(human.get_meaning())  # "NCBITaxon:9606"
print(human.get_description())  # "Homo sapiens (human)"

# Cell biology with CL and GO mappings
phase = CellCyclePhase.S_PHASE
print(phase.get_meaning())  # "GO:0000084"

neuron = CellType.NEURON
print(neuron.get_meaning())  # "CL:0000540"

# Get all organisms at a specific taxonomic level
mammals = [org for org in CommonOrganismTaxaEnum
           if 'MAMMALIA' in str(org)]

๐Ÿ—๏ธ Available Domains

Core Domains (Most Mature)

  • ๐Ÿงฌ Biology:
    • Structural Biology: Cryo-EM techniques, crystallization methods, detectors
    • Cell Biology: Cell types, cell cycle phases, organelles
    • Taxonomy: Model organisms (all with NCBI Taxonomy IDs)
  • ๐Ÿ“ Spatial: Anatomical directions, planes, relationships (BSPO mapped)
  • ๐Ÿ“Š Statistics: Statistical tests (STATO mapped), p-value thresholds

Expanding Domains

  • ๐Ÿงช Data Science: ML model types, dataset splits, metrics
  • โš—๏ธ Materials Science: Crystal structures, characterization methods
  • ๐Ÿฅ Clinical/Medical: Blood types (SNOMED), vital status
  • ๐ŸŒ Environmental: Exposure routes, pollutants
  • โšก Energy: Sources, storage methods, efficiency ratings

Coming Soon

  • ๐Ÿงญ Geography: Country codes (ISO), time zones, coordinate systems
  • โฐ Time: Temporal relationships, periods, frequencies
  • ๐Ÿ’ผ Academic: Publication types, research roles, funding sources
  • ๐Ÿญ Industrial: Manufacturing processes, quality standards

๐Ÿ”„ Multiple Use Cases

1. LinkML Standards (YAML schemas)

Use the raw LinkML schemas for data modeling, validation, and documentation:

# Direct schema usage
Person:
  attributes:
    vital_status:
      range: VitalStatusEnum  # ALIVE, DECEASED, UNKNOWN

2. Python Programming (Rich Enums)

Get Python enums with full IDE support, type checking, and semantic metadata:

# Type-safe enums with ontology mappings
status = VitalStatusEnum.ALIVE  
print(status.meaning)  # "NCIT:C37987"

3. "Stealth Semantics"

Write simple code, get semantic meaning automatically:

# Example: Different systems use different names for the same concept
from valuesets.enums.medical import BloodTypeEnum
from external_system import PatientBloodType  # Third-party enum

# Even though the enum values might be named differently:
# BloodTypeEnum.A_POSITIVE vs PatientBloodType.A_POS
# They map to the same SNOMED code: SNOMED:278149003

if blood_type.get_meaning() == patient_blood.get_meaning():
    # Semantic interoperability - works across different naming conventions
    process_compatible_blood_type()

# Or use the utility function
if same_meaning_as(blood_type, patient_blood):
    process_compatible_blood_type()

4. Multi-language Interoperability

Generate schemas and types for any language:

# Generate JSON Schema for web apps
gen-jsonschema schema.yaml

# Generate TypeScript definitions  
gen-typescript schema.yaml -t typescript

# Generate JSON-LD
gen-jsonld schema.yaml

5. Integration & Tooling

  • Excel/Google Sheets: Generate dropdown validation lists
  • Web forms: Auto-generate select options with descriptions
  • APIs: Standardized response codes and classifications
  • Databases: Consistent foreign key constraints

๐Ÿ› ๏ธ Advanced Features

Hierarchical Relationships

# Some enums support hierarchical is_a relationships
from valuesets.enums import ViralGenomeTypeEnum

# Baltimore classification with hierarchy
positive_rna = ViralGenomeTypeEnum.SSRNA_POSITIVE  # Group IV
# inherits from SSRNA (single-stranded RNA)

Rich Metadata

from valuesets.enums.bio.structural_biology import CryoEMGridType

grid = CryoEMGridType.QUANTIFOIL
metadata = grid.get_metadata()
print(metadata)
# {
#   'name': 'QUANTIFOIL',
#   'value': 'QUANTIFOIL',
#   'description': 'Quantifoil holey carbon grid',
#   'annotations': {
#     'hole_sizes': '1.2/1.3, 2/1, 2/2 ฮผm common',
#     'manufacturer': 'Quantifoil'
#   }
# }

# Get all grid types with their descriptions at once
all_grids = CryoEMGridType.get_all_descriptions()
# {'C_FLAT': 'C-flat holey carbon grid', 'QUANTIFOIL': ...}

Utility Functions

from valuesets.enums.spatial import AnatomicalPlane

# Get all ontology mappings for an enum
mappings = AnatomicalPlane.get_all_meanings()
print(mappings)
# {'SAGITTAL': 'BSPO:0000417', 'CORONAL': 'BSPO:0000019', ...}

# List all metadata for every value in an enum
all_metadata = AnatomicalPlane.list_metadata()
for name, meta in all_metadata.items():
    print(f"{name}: {meta.get('description', 'No description')}")

# Find enum by ontology term (useful for data integration)
plane = AnatomicalPlane.from_meaning("BSPO:0000417")  # Returns SAGITTAL

Dynamic Enums

Some enums in this collection are dynamic enums that can be expanded at runtime by querying ontologies. This uses LinkML's Dynamic Enum feature.

# Example: A dynamic enum that pulls values from an ontology
CellTypeEnum:
  # Dynamic expansion from Cell Ontology
  reachable_from:
    source_ontology: obo:cl
    source_nodes:
      - CL:0000540  # neuron
    include_self: false
    relationship_types:
      - rdfs:subClassOf

Note: Runtime expansion support is coming soon! Currently, dynamic enums provide:

  • โœ… Static values with ontology mappings
  • โœ… Metadata and descriptions
  • ๐Ÿšง Runtime expansion from ontologies (coming in next release)

When runtime expansion is available, you'll be able to:

# Future: Dynamically expand enum with all neuron subtypes
cell_types = CellTypeEnum.expand_from_ontology()
# Would add: MOTOR_NEURON, SENSORY_NEURON, INTERNEURON, etc.

๐Ÿ“– Documentation

Full Documentation Website โ†’

OWL/RDF Representation

The value sets are also available as an OWL ontology for semantic web applications and ontology browsers:

The OWL representation allows you to:

  • Browse value sets in ontology browsers
  • Perform SPARQL queries
  • Integrate with semantic web applications
  • Link to other biomedical ontologies

๐Ÿš€ Future Directions

Maturity Levels

We plan to add maturity level metadata to each enum to help users understand their readiness:

  • ๐ŸŸข Stable: Production-ready, well-tested, unlikely to change
  • ๐ŸŸก Beta: Usable but may have minor changes
  • ๐Ÿ”ด Draft: Under development, expect changes
# Future: Check maturity before use
if enum_def.maturity_level == MaturityLevel.STABLE:
    use_in_production()

Modularization

Split the package into domain-specific modules for lighter installs:

# Future: Install only what you need
pip install valuesets-core        # Core functionality
pip install valuesets-bio         # Biological domains
pip install valuesets-materials   # Materials science
pip install valuesets-clinical    # Clinical/medical

Community Extensions

  • Domain Packages: Community-maintained domain-specific value sets
  • Organization Standards: Company/institution-specific enums that extend base sets
  • Mapping Tables: Cross-ontology and cross-standard mappings

Advanced Features

  • ๐Ÿค– AI/LLM Integration: Semantic annotations optimized for language models
  • ๐Ÿ“Š Usage Analytics: Track which enums are most used, identify gaps
  • ๐Ÿ”„ Version Management: Handle enum evolution with deprecation warnings
  • ๐ŸŒ Multi-ontology Support: Map single values to multiple ontologies
  • ๐Ÿ” Fuzzy Matching: Find enums by approximate string matching

๐Ÿ—๏ธ Development

Installation

git clone https://github.com/linkml/valuesets
cd valuesets
uv install

Available Commands

just --list  # Show all available commands
just test    # Run tests  
just doctest # Run doctests
just lint    # Run linting
just site    # Build documentation site

๐Ÿค Contributing

We welcome contributions! Whether you're adding new domains, improving existing enums, or fixing bugs:

  1. Domain Experts: Contribute standardized value sets for your field
  2. Developers: Add utility functions, improve tooling, fix issues
  3. Users: Report missing enums, suggest improvements, share use cases

๐Ÿ“ Repository Structure

โ”œโ”€โ”€ src/valuesets/
โ”‚   โ”œโ”€โ”€ schema/              # ๐Ÿ“ LinkML YAML schemas (source of truth)
โ”‚   โ”‚   โ”œโ”€โ”€ bio/            # Biological domains
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ cell_biology.yaml
โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ structural_biology.yaml
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ taxonomy.yaml
โ”‚   โ”‚   โ”œโ”€โ”€ spatial/        # Spatial and anatomical
โ”‚   โ”‚   โ”‚   โ””โ”€โ”€ spatial_qualifiers.yaml
โ”‚   โ”‚   โ”œโ”€โ”€ statistics.yaml
โ”‚   โ”‚   โ””โ”€โ”€ core.yaml
โ”‚   โ”œโ”€โ”€ enums/              # ๐Ÿ Generated Python enums
โ”‚   โ”‚   โ””โ”€โ”€ <auto-generated from schemas>
โ”‚   โ”œโ”€โ”€ generators/         # ๐Ÿ”ง Rich enum generator
โ”‚   โ”‚   โ””โ”€โ”€ rich_enum.py
โ”‚   โ””โ”€โ”€ validators/         # โœ“ Ontology validation
โ”‚       โ””โ”€โ”€ enum_evaluator.py
โ”œโ”€โ”€ docs/                   # ๐Ÿ“š Documentation
โ””โ”€โ”€ tests/                  # ๐Ÿงช Test cases
    โ”œโ”€โ”€ test_rich_enums.py  # Rich enum functionality
    โ””โ”€โ”€ validators/         # Ontology validation tests

๐Ÿ“œ Credits

Built with LinkML and the linkml-project-copier template.


Making data standardization simple, semantic, and scalable ๐Ÿš€

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

valuesets-0.4.1.tar.gz (13.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

valuesets-0.4.1-py3-none-any.whl (2.1 MB view details)

Uploaded Python 3

File details

Details for the file valuesets-0.4.1.tar.gz.

File metadata

  • Download URL: valuesets-0.4.1.tar.gz
  • Upload date:
  • Size: 13.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for valuesets-0.4.1.tar.gz
Algorithm Hash digest
SHA256 dac1a1b1bf5f1bc00407853ee0b8aa1d89a37817db55d762f4e8ae502ca6a7ed
MD5 fe77e7291b2d3d82cdffe7fb3a350c58
BLAKE2b-256 87a899df4f25bcf2f68a02d1808c22f0780e5b1669414353e36835257e28c3b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for valuesets-0.4.1.tar.gz:

Publisher: pypi-publish.yaml on linkml/valuesets

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file valuesets-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: valuesets-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 2.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for valuesets-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 688a041d593c4500870ddfeed6efd622c17efee5eabb1d92717bd85abeb6bc7d
MD5 7f7ba74e8ee28ff01517ebf79f14cca7
BLAKE2b-256 7817c4b16d78968dba444b235842f008503766e35910d7cf8f84751e562afbc0

See more details on using hashes here.

Provenance

The following attestation bundles were made for valuesets-0.4.1-py3-none-any.whl:

Publisher: pypi-publish.yaml on linkml/valuesets

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page