An open taxonomy for classifying sports and physical activities.
Project description
OpenSportTaxonomy
An open taxonomy for classifying sports and physical activities.
Every platform has invented its own list of sports. Apple HealthKit calls it Cycling, Strava calls it Ride, Garmin calls it ROAD_CYCLING. None of them are hierarchical, none map to each other, and none are open standards.
OpenSportTaxonomy provides a single canonical set of sport codes that any application can reference.
[!WARNING] This taxonomy is young and only covers a few sports at the moment. If yours is missing, open an issue. We'd love to expand it together.
How it works
An activity is identified by a sport string: dots (.) separate the sport from its disciplines in the sport hierarchy, plusses (+) attach modifiers.
Example: cycling.road+stationary+virtual
cycling . road + stationary + virtual
\-----/ \--/ \--------/ \-----/
sport discipline modifier modifier
\___________/ \____________________/
sport code modifiers
More examples:
| Sport string | Meaning |
|---|---|
cycling.road |
road cycling |
cycling.road+race |
road cycling race |
cycling.road+stationary+virtual |
road cycling, for example on Zwift |
cycling.gravel+assisted+commute |
e-bike gravel commute |
running.trail+race |
trail running race |
xc_skiing.classic+roller |
classic roller skiing |
Sport codes form a tree using dot notation. cycling contains cycling.road, cycling.gravel, cycling.track, and so on. The hierarchy is encoded in the code itself: the parent of cycling.road is cycling. Querying for cycling should naturally include all its children.
Modifiers describe circumstances, not disciplines. Road cycling on a trainer is still road cycling, performed on a stationary machine. Modifiers are appended with + and sorted alphabetically. They are independent: a Zwift ride is both stationary and virtual, set separately.
See the full reference for all sport codes and modifiers.
Structured format
When your context needs separate fields (API payloads, database columns), the same information can be represented as:
{ "sport": "cycling.road", "modifiers": ["stationary", "virtual"] }
The sport string is the canonical form. The structured format is derived from it.
Design principles
Sport code or modifier? If you removed it, would an athlete still recognize the activity as the same sport? If yes, it's a modifier. If no, it's a sport code.
One activity, one sport. Multi-sport events like triathlons are composed of separate single-sport activities.
Venues are not modifiers. Track cycling happens in a velodrome. That's its natural setting, not a "modified" version of outdoor cycling.
Modifiers are explicit. No modifier implies another. A Zwift ride is stationary+virtual — both set separately, because a trainer without a screen is stationary but not virtual. Absence means unspecified, not "the opposite."
Schema format
The canonical schema is schema.yaml, a single YAML file with two flat lists: sports (sorted alphabetically, hierarchy in the dot notation) and modifiers (with optional group for mutual exclusivity).
Platform mappings
Mapping files in mappings/ translate OST codes to platform-specific identifiers. One file per platform:
apple_healthkit.yaml— HKWorkoutActivityType integer valuesgarmin_fit.yaml— sport + sub_sport integer pairsgarmin_training_api.yaml— Training API V2 sport type stringsstrava.yaml— SportType string values
Translations are lossy by design. Some platforms are less granular than the taxonomy: all cycling disciplines map to a single HealthKit value (13). This is the platform's limitation, not an error.
# The same OST code on three platforms:
- ost: cycling.road
target: 13 # Apple HealthKit
- ost: cycling.road
target: { sport: 2, sub_sport: 7 } # Garmin FIT
- ost: cycling.road
target: Ride # Strava
Python library
Install the reference implementation:
pip install open-sport-taxonomy
Working with sport strings
The library has two entry points for creating Sport objects:
| Method | Use when |
|---|---|
Sport(raw) |
Application code, constants, prescriptions. Enforces the standard vocabulary. |
Sport.parse(raw) |
Receiving external input. Accepts any structurally valid sport string. |
A standard sport is one where the code and all modifiers are defined in the current taxonomy version. A non-standard sport is structurally valid but contains codes or modifiers not yet in the taxonomy, typically from a newer version. Non-standard is not invalid, it's unrecognized.
from open_sport_taxonomy import Sport, Modifier
# Strict constructor for application code
sport = Sport("cycling.road+race+virtual")
sport.code # "cycling.road"
sport.label # "road cycling"
sport.modifiers # frozenset({Modifier.RACE, Modifier.VIRTUAL})
sport.is_standard # True
str(sport) # "cycling.road+race+virtual"
# Unknown codes and modifiers are rejected
Sport("cycling.road.criterium") # ValueError: Unknown sport code
Sport("cycling.road+rainy") # ValueError (unknown modifier)
# Parse: for external input, preserves everything
sport = Sport.parse("cycling.road.criterium+race+rainy")
sport.code # "cycling.road.criterium" (preserved)
sport.modifiers # frozenset({Modifier.RACE, "rainy"})
sport.is_standard # False
str(sport) # "cycling.road.criterium+race+rainy" (round-trips)
# Resolve: map a non-standard sport to the nearest standard equivalent
resolved = sport.resolve()
resolved.code # "cycling.road"
resolved.modifiers # frozenset({Modifier.RACE})
resolved.is_standard # True
Storage pattern
Always store str(sport) in your database. It preserves the original sport string with full fidelity. Use Sport.parse() when loading, then .resolve() for application logic. When you upgrade the library, previously non-standard sports become standard automatically. No data migration needed.
# On ingest
sport = Sport.parse(api_response["sport"])
db.activity.sport = str(sport) # store faithfully
# On load
sport = Sport.parse(db.activity.sport)
resolved = sport.resolve() # for application logic
Class constants
For known sports in application code, use class constants:
Sport.CYCLING_ROAD
Sport.RUNNING_TRAIL
Sport.SWIMMING_OPEN_WATER
Taxonomy navigation
Sport.CYCLING.disciplines # (Sport('cycling.cyclocross'), Sport('cycling.gravel'), ...)
Sport.CYCLING_ROAD.parent # Sport('cycling')
Sport.all() # all standard sports
# Parent preserves modifiers
Sport("cycling.road+stationary").parent # Sport('cycling+stationary')
Sport matching
Check if a sport is a more specific version of another:
# Prescription matching: does the execution satisfy the prescription?
executed = Sport("cycling.road+stationary")
prescribed = Sport("cycling+stationary")
executed.is_subsport_of(prescribed) # True
# Extra modifiers are fine
Sport("cycling.road+stationary+race").is_subsport_of(Sport("cycling+stationary")) # True
# Missing modifiers or wrong hierarchy: no match
Sport("cycling.road").is_subsport_of(Sport("cycling+stationary")) # False
Sport("running").is_subsport_of(Sport("cycling")) # False
Platform translation
from open_sport_taxonomy.platforms import strava, apple_healthkit, garmin_fit, garmin_training_api
strava.translate(Sport("cycling.road+virtual")) # "VirtualRide"
apple_healthkit.translate(Sport.CYCLING_ROAD) # 13
garmin_fit.translate(Sport.CYCLING_ROAD) # GarminFitCode(sport=2, sub_sport=7)
garmin_training_api.translate(Sport.CYCLING_ROAD) # "CYCLING"
Pydantic integration
Install with the pydantic extra:
pip install open-sport-taxonomy[pydantic]
Use SportField in Pydantic models for permissive parsing, or StrictSportField to enforce the standard vocabulary:
from pydantic import BaseModel
from open_sport_taxonomy.pydantic import SportField, StrictSportField
class Workout(BaseModel):
sport: SportField # accepts any structurally valid sport string
class Prescription(BaseModel):
sport: StrictSportField # rejects unknown codes and modifiers
w = Workout(sport="cycling.road+stationary")
w.sport.code # "cycling.road"
w.model_dump() # {"sport": "cycling.road+stationary"}
What the taxonomy does not cover
- Venue properties like pool length (25m vs 50m) or track size. These matter for records and performance but are not distinct disciplines. Planned for a future version.
Versioning
The taxonomy follows Semantic Versioning. Each release is a git tag and a GitHub Release. Sport codes are stable: once published, never removed, only deprecated.
# Latest
https://raw.githubusercontent.com/sweatstack/open-sport-taxonomy/main/schema.yaml
# Pinned to a version
https://raw.githubusercontent.com/sweatstack/open-sport-taxonomy/v0.1.0/schema.yaml
Contributing
See CONTRIBUTING.md.
License
MIT. Maintained by SweatStack.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file open_sport_taxonomy-0.3.1.tar.gz.
File metadata
- Download URL: open_sport_taxonomy-0.3.1.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24a0425e4993ba77c643c4b7010b5568c5b53d49486ede867d366b80979f9680
|
|
| MD5 |
71cdf789bccb7da0e569ab378ed82df8
|
|
| BLAKE2b-256 |
8f5f451ab4b790e36723e021cbf80dc2c0b3b30d5c36e9680d2bfe350622b0ab
|
File details
Details for the file open_sport_taxonomy-0.3.1-py3-none-any.whl.
File metadata
- Download URL: open_sport_taxonomy-0.3.1-py3-none-any.whl
- Upload date:
- Size: 14.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa30c7a714a8d44fa29f6f1a9089a3a3f6e98321fee396bce3b89724b8963579
|
|
| MD5 |
0959dbb8f0f3503fa1a1dba5cf36dda2
|
|
| BLAKE2b-256 |
8bb7f15834ac02127b10b55d839dd1da96cbf7d59ae8513e3386fb7e06f019c7
|