A collection of commonly used value sets
Project description
Common Value Sets
A comprehensive collection of standardized enumerations and value sets for data science, bioinformatics, materials science, and beyond.
๐ฏ Why Common Value Sets?
Data standardization is hard. Every project reinvents the wheel with custom enums, inconsistent naming, and no semantic meaning.
Common Value Sets solves this by providing:
- ๐ Rich, standardized enumerations โ Pre-defined value sets across multiple domains
- ๐งฌ Semantic meaning โ Every value is linked to ontology terms (when possible)
- ๐ Python-first convenience โ Work with simple enums, get semantics for free
- ๐ Multi-language support โ Generate JSON Schema, TypeScript, and more
- ๐ Interoperability โ Built on LinkML standards for maximum compatibility
๐ A Simple Example
Different datasets often represent the same concept in incompatible ways:
M/Fmale/female1/2
They all mean the same thing, but they donโt interoperate.
With Common Value Sets, you can instead use a shared enum:
from valuesets.enums.core import SexEnum
s = SexEnum.MALE
print(s.value) # "MALE"
print(s.get_meaning()) # "NCIT:C20197"
print(s.get_description())# "Male sex"
โก Quick Start
For Python Developers
from valuesets.enums.bio.structural_biology import StructuralBiologyTechnique
from valuesets.enums.spatial.spatial_qualifiers import AnatomicalSide
# Rich enums with metadata and ontology mappings
technique = StructuralBiologyTechnique.CRYO_EM
print(technique.value) # "CRYO_EM"
print(technique.get_description()) # "Cryo-electron microscopy"
print(technique.get_meaning()) # "CHMO:0002413" (Chemical Methods Ontology)
print(technique.get_annotations()) # {'resolution_range': '2-30 ร
typical', ...}
# Spatial relationships with BSPO mappings
side = AnatomicalSide.LEFT
print(side.get_meaning()) # "BSPO:0000000" (Biological Spatial Ontology)
# Look up enums by their ontology terms
found = AnatomicalSide.from_meaning("BSPO:0000000") # Returns LEFT
For Data Scientists
from valuesets.enums.statistics import StatisticalTest, PValueThreshold
from valuesets.enums.data_science import DatasetSplitType, ModelType
# Standardized statistical tests with STATO ontology mappings
test = StatisticalTest.STUDENTS_T_TEST
print(test.get_meaning()) # "STATO:0000176"
print(test.get_description()) # "Student's t-test for comparing means"
# ML pipeline with standard splits
split = DatasetSplitType.TRAIN
model = ModelType.RANDOM_FOREST
# P-value thresholds with clear semantics
threshold = PValueThreshold.SIGNIFICANT
print(threshold.get_annotations()) # {'value': 0.05, 'symbol': '*'}
For Bioinformaticians
from valuesets.enums.bio.taxonomy import CommonOrganismTaxaEnum, BiologicalKingdom
from valuesets.enums.bio.cell_biology import CellCyclePhase, CellType
# Model organisms with NCBI Taxonomy IDs
human = CommonOrganismTaxaEnum.HUMAN
print(human.get_meaning()) # "NCBITaxon:9606"
print(human.get_description()) # "Homo sapiens (human)"
# Cell biology with CL and GO mappings
phase = CellCyclePhase.S_PHASE
print(phase.get_meaning()) # "GO:0000084"
neuron = CellType.NEURON
print(neuron.get_meaning()) # "CL:0000540"
# Get all organisms at a specific taxonomic level
mammals = [org for org in CommonOrganismTaxaEnum
if 'MAMMALIA' in str(org)]
๐๏ธ Available Domains
Core Domains (Most Mature)
- ๐งฌ Biology:
- Structural Biology: Cryo-EM techniques, crystallization methods, detectors
- Cell Biology: Cell types, cell cycle phases, organelles
- Taxonomy: Model organisms (all with NCBI Taxonomy IDs)
- ๐ Spatial: Anatomical directions, planes, relationships (BSPO mapped)
- ๐ Statistics: Statistical tests (STATO mapped), p-value thresholds
Expanding Domains
- ๐งช Data Science: ML model types, dataset splits, metrics
- โ๏ธ Materials Science: Crystal structures, characterization methods
- ๐ฅ Clinical/Medical: Blood types (SNOMED), vital status
- ๐ Environmental: Exposure routes, pollutants
- โก Energy: Sources, storage methods, efficiency ratings
Coming Soon
- ๐งญ Geography: Country codes (ISO), time zones, coordinate systems
- โฐ Time: Temporal relationships, periods, frequencies
- ๐ผ Academic: Publication types, research roles, funding sources
- ๐ญ Industrial: Manufacturing processes, quality standards
๐ Multiple Use Cases
1. LinkML Standards (YAML schemas)
Use the raw LinkML schemas for data modeling, validation, and documentation:
# Direct schema usage
Person:
attributes:
vital_status:
range: VitalStatusEnum # ALIVE, DECEASED, UNKNOWN
2. Python Programming (Rich Enums)
Get Python enums with full IDE support, type checking, and semantic metadata:
# Type-safe enums with ontology mappings
status = VitalStatusEnum.ALIVE
print(status.meaning) # "NCIT:C37987"
3. "Stealth Semantics"
Write simple code, get semantic meaning automatically:
# Example: Different systems use different names for the same concept
from valuesets.enums.medical import BloodTypeEnum
from external_system import PatientBloodType # Third-party enum
# Even though the enum values might be named differently:
# BloodTypeEnum.A_POSITIVE vs PatientBloodType.A_POS
# They map to the same SNOMED code: SNOMED:278149003
if blood_type.get_meaning() == patient_blood.get_meaning():
# Semantic interoperability - works across different naming conventions
process_compatible_blood_type()
# Or use the utility function
if same_meaning_as(blood_type, patient_blood):
process_compatible_blood_type()
4. Multi-language Interoperability
Generate schemas and types for any language:
# Generate JSON Schema for web apps
gen-jsonschema schema.yaml
# Generate TypeScript definitions
gen-typescript schema.yaml -t typescript
# Generate JSON-LD
gen-jsonld schema.yaml
5. Integration & Tooling
- Excel/Google Sheets: Generate dropdown validation lists
- Web forms: Auto-generate select options with descriptions
- APIs: Standardized response codes and classifications
- Databases: Consistent foreign key constraints
๐ ๏ธ Advanced Features
Hierarchical Relationships
# Some enums support hierarchical is_a relationships
from valuesets.enums import ViralGenomeTypeEnum
# Baltimore classification with hierarchy
positive_rna = ViralGenomeTypeEnum.SSRNA_POSITIVE # Group IV
# inherits from SSRNA (single-stranded RNA)
Rich Metadata
from valuesets.enums.bio.structural_biology import CryoEMGridType
grid = CryoEMGridType.QUANTIFOIL
metadata = grid.get_metadata()
print(metadata)
# {
# 'name': 'QUANTIFOIL',
# 'value': 'QUANTIFOIL',
# 'description': 'Quantifoil holey carbon grid',
# 'annotations': {
# 'hole_sizes': '1.2/1.3, 2/1, 2/2 ฮผm common',
# 'manufacturer': 'Quantifoil'
# }
# }
# Get all grid types with their descriptions at once
all_grids = CryoEMGridType.get_all_descriptions()
# {'C_FLAT': 'C-flat holey carbon grid', 'QUANTIFOIL': ...}
Utility Functions
from valuesets.enums.spatial import AnatomicalPlane
# Get all ontology mappings for an enum
mappings = AnatomicalPlane.get_all_meanings()
print(mappings)
# {'SAGITTAL': 'BSPO:0000417', 'CORONAL': 'BSPO:0000019', ...}
# List all metadata for every value in an enum
all_metadata = AnatomicalPlane.list_metadata()
for name, meta in all_metadata.items():
print(f"{name}: {meta.get('description', 'No description')}")
# Find enum by ontology term (useful for data integration)
plane = AnatomicalPlane.from_meaning("BSPO:0000417") # Returns SAGITTAL
Dynamic Enums
Some enums in this collection are dynamic enums that can be expanded at runtime by querying ontologies. This uses LinkML's Dynamic Enum feature.
# Example: A dynamic enum that pulls values from an ontology
CellTypeEnum:
# Dynamic expansion from Cell Ontology
reachable_from:
source_ontology: obo:cl
source_nodes:
- CL:0000540 # neuron
include_self: false
relationship_types:
- rdfs:subClassOf
Note: Runtime expansion support is coming soon! Currently, dynamic enums provide:
- โ Static values with ontology mappings
- โ Metadata and descriptions
- ๐ง Runtime expansion from ontologies (coming in next release)
When runtime expansion is available, you'll be able to:
# Future: Dynamically expand enum with all neuron subtypes
cell_types = CellTypeEnum.expand_from_ontology()
# Would add: MOTOR_NEURON, SENSORY_NEURON, INTERNEURON, etc.
๐ Documentation
Full Documentation Website โ
OWL/RDF Representation
The value sets are also available as an OWL ontology for semantic web applications and ontology browsers:
- Direct Download: https://w3id.org/valuesets/valuesets.owl.ttl
- BioPortal: Available at BioPortal
- Ontology Lookup Service (OLS): Submission planned for OLS
The OWL representation allows you to:
- Browse value sets in ontology browsers
- Perform SPARQL queries
- Integrate with semantic web applications
- Link to other biomedical ontologies
๐ Future Directions
Maturity Levels
We plan to add maturity level metadata to each enum to help users understand their readiness:
- ๐ข Stable: Production-ready, well-tested, unlikely to change
- ๐ก Beta: Usable but may have minor changes
- ๐ด Draft: Under development, expect changes
# Future: Check maturity before use
if enum_def.maturity_level == MaturityLevel.STABLE:
use_in_production()
Modularization
Split the package into domain-specific modules for lighter installs:
# Future: Install only what you need
pip install valuesets-core # Core functionality
pip install valuesets-bio # Biological domains
pip install valuesets-materials # Materials science
pip install valuesets-clinical # Clinical/medical
Community Extensions
- Domain Packages: Community-maintained domain-specific value sets
- Organization Standards: Company/institution-specific enums that extend base sets
- Mapping Tables: Cross-ontology and cross-standard mappings
Advanced Features
- ๐ค AI/LLM Integration: Semantic annotations optimized for language models
- ๐ Usage Analytics: Track which enums are most used, identify gaps
- ๐ Version Management: Handle enum evolution with deprecation warnings
- ๐ Multi-ontology Support: Map single values to multiple ontologies
- ๐ Fuzzy Matching: Find enums by approximate string matching
๐๏ธ Development
Installation
git clone https://github.com/linkml/valuesets
cd valuesets
uv install
Available Commands
just --list # Show all available commands
just test # Run tests
just doctest # Run doctests
just lint # Run linting
just site # Build documentation site
๐ค Contributing
We welcome contributions! Whether you're adding new domains, improving existing enums, or fixing bugs:
- Domain Experts: Contribute standardized value sets for your field
- Developers: Add utility functions, improve tooling, fix issues
- Users: Report missing enums, suggest improvements, share use cases
๐ Repository Structure
โโโ src/valuesets/
โ โโโ schema/ # ๐ LinkML YAML schemas (source of truth)
โ โ โโโ bio/ # Biological domains
โ โ โ โโโ cell_biology.yaml
โ โ โ โโโ structural_biology.yaml
โ โ โ โโโ taxonomy.yaml
โ โ โโโ spatial/ # Spatial and anatomical
โ โ โ โโโ spatial_qualifiers.yaml
โ โ โโโ statistics.yaml
โ โ โโโ core.yaml
โ โโโ enums/ # ๐ Generated Python enums
โ โ โโโ <auto-generated from schemas>
โ โโโ generators/ # ๐ง Rich enum generator
โ โ โโโ rich_enum.py
โ โโโ validators/ # โ Ontology validation
โ โโโ enum_evaluator.py
โโโ docs/ # ๐ Documentation
โโโ tests/ # ๐งช Test cases
โโโ test_rich_enums.py # Rich enum functionality
โโโ validators/ # Ontology validation tests
๐ Credits
Built with LinkML and the linkml-project-copier template.
Making data standardization simple, semantic, and scalable ๐
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file valuesets-0.4.1.tar.gz.
File metadata
- Download URL: valuesets-0.4.1.tar.gz
- Upload date:
- Size: 13.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dac1a1b1bf5f1bc00407853ee0b8aa1d89a37817db55d762f4e8ae502ca6a7ed
|
|
| MD5 |
fe77e7291b2d3d82cdffe7fb3a350c58
|
|
| BLAKE2b-256 |
87a899df4f25bcf2f68a02d1808c22f0780e5b1669414353e36835257e28c3b2
|
Provenance
The following attestation bundles were made for valuesets-0.4.1.tar.gz:
Publisher:
pypi-publish.yaml on linkml/valuesets
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
valuesets-0.4.1.tar.gz -
Subject digest:
dac1a1b1bf5f1bc00407853ee0b8aa1d89a37817db55d762f4e8ae502ca6a7ed - Sigstore transparency entry: 775977848
- Sigstore integration time:
-
Permalink:
linkml/valuesets@a16c234eee6debd6da14c9327e8a8a8a1c63125d -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/linkml
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yaml@a16c234eee6debd6da14c9327e8a8a8a1c63125d -
Trigger Event:
release
-
Statement type:
File details
Details for the file valuesets-0.4.1-py3-none-any.whl.
File metadata
- Download URL: valuesets-0.4.1-py3-none-any.whl
- Upload date:
- Size: 2.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
688a041d593c4500870ddfeed6efd622c17efee5eabb1d92717bd85abeb6bc7d
|
|
| MD5 |
7f7ba74e8ee28ff01517ebf79f14cca7
|
|
| BLAKE2b-256 |
7817c4b16d78968dba444b235842f008503766e35910d7cf8f84751e562afbc0
|
Provenance
The following attestation bundles were made for valuesets-0.4.1-py3-none-any.whl:
Publisher:
pypi-publish.yaml on linkml/valuesets
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
valuesets-0.4.1-py3-none-any.whl -
Subject digest:
688a041d593c4500870ddfeed6efd622c17efee5eabb1d92717bd85abeb6bc7d - Sigstore transparency entry: 775977849
- Sigstore integration time:
-
Permalink:
linkml/valuesets@a16c234eee6debd6da14c9327e8a8a8a1c63125d -
Branch / Tag:
refs/tags/v0.4.1 - Owner: https://github.com/linkml
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yaml@a16c234eee6debd6da14c9327e8a8a8a1c63125d -
Trigger Event:
release
-
Statement type: