Turn messy human labels into clean, consistent, code-safe field names.
Project description
LabelSmith
Turn messy human labels into clean, consistent, code-safe field names.
LabelSmith takes the kind of strings that show up on real-world spreadsheets,
form captions, checksheets, and PDF tables — "Part Number", "Op. #2 (mm)",
"Café — naïve" — and converts them into deterministic identifiers your code
can rely on.
It is intentionally small. No AI, no LLM calls, no Excel or PDF parsing — just a focused, well-tested core for naming things.
Install
pip install labelsmith
LabelSmith runs on Python 3.10+ and depends only on the standard library.
Quick start
from labelsmith import field_name, field_names, field_map
field_name("Part Number")
# 'part_number'
field_names(["Part Number", "Part Number", "Op. #2"])
# ['part_number', 'part_number_2', 'op_2']
field_map(["Part Number", "Part Number"])
# {'Part Number': 'part_number', 'Part Number (2)': 'part_number_2'}
Styles
LabelSmith supports four output styles:
| Style | Example output |
|---|---|
snake |
part_number |
camel |
partNumber |
pascal |
PartNumber |
kebab |
part-number |
field_name("Part Number") # 'part_number'
field_name("Part Number", style="camel") # 'partNumber'
field_name("Part Number", style="pascal") # 'PartNumber'
field_name("Part Number", style="kebab") # 'part-number'
Any other value for style raises ValueError
(labelsmith.UnsupportedStyleError).
Acronyms in camelCase and PascalCase
All-uppercase tokens are preserved as acronyms in camel and pascal
styles, so manufacturing/checksheet labels with industry-standard
acronyms stay recognizable:
field_name("AIAG/VDA Severity", style="pascal") # 'AIAGVDASeverity'
field_name("AIAG/VDA Severity", style="camel") # 'aiagVDASeverity'
field_name("PFMEA Cause(s)", style="pascal") # 'PFMEACauseS'
field_name("N Gage Length (MACH)", style="pascal") # 'NGageLengthMACH'
field_name("HTTPResponseCode", style="pascal") # 'HTTPResponseCode'
camelCase always lowercases the first token, even when it's an acronym:
field_name("AIAG", style="camel") # 'aiag'
field_name("AIAG", style="pascal") # 'AIAG'
snake and kebab always lowercase every token, so acronym handling
doesn't apply there:
field_name("AIAG/VDA Severity", style="snake") # 'aiag_vda_severity'
field_name("AIAG/VDA Severity", style="kebab") # 'aiag-vda-severity'
Cleaning behavior
LabelSmith trims whitespace, decomposes Unicode to ASCII where reasonable, splits on punctuation, symbols, and case boundaries, then re-joins using the requested style.
field_name(" Café — Naïve ") # 'cafe_naive'
field_name("Op. #2 (mm)") # 'op_2_mm'
field_name("HTTPResponseCode") # 'http_response_code'
field_name("first/second-third") # 'first_second_third'
If a label normalizes to nothing, you get the prefix rendered in the
chosen style. The default prefix is "field", so:
field_name("") # 'field'
field_name("***") # 'field'
field_name("", prefix="col") # 'col'
field_name("", style="pascal") # 'Field'
field_name("", style="kebab", prefix="my field") # 'my-field'
field_name("", style="camel", prefix="my field") # 'myField'
If prefix itself is empty or contains no usable alphanumeric content
("", "_", "---", whitespace), LabelSmith falls back to "field" so
you never get back an unusable identifier:
field_name("", prefix="") # 'field'
field_name("", prefix="_") # 'field'
field_name("***", prefix="---") # 'field'
Labels that start with a digit
By default, names that would start with a digit get the configured prefix woven in using the chosen style, so the result is a safe identifier and stays consistent with the style you asked for:
field_name("123 Part Number", style="snake") # 'field_123_part_number'
field_name("123 Part Number", style="kebab") # 'field-123-part-number'
field_name("123 Part Number", style="camel") # 'field123PartNumber'
field_name("123 Part Number", style="pascal") # 'Field123PartNumber'
Opt out with allow_leading_digit=True, or supply a different prefix:
field_name("1st Place", allow_leading_digit=True) # '1_st_place'
field_name("1st", prefix="col") # 'col_1_st'
field_name("1st", prefix="col", style="kebab") # 'col-1-st'
If prefix is empty or contains no usable alphanumeric content
("", "_", "---", whitespace), LabelSmith falls back to "field" so
the result is always a safe identifier:
field_name("123 Part", prefix="") # 'field_123_part'
field_name("123 Part", prefix="---", style="kebab") # 'field-123-part'
Multi-token prefixes are tokenized and re-styled along with the label, so the whole result stays consistent:
field_name("123 Part Number", prefix="my field", style="camel")
# 'myField123PartNumber'
field_name("123 Part Number", prefix="my field", style="pascal")
# 'MyField123PartNumber'
Reserved words
Names that collide with Python reserved keywords get a trailing underscore so they remain usable as identifiers:
field_name("class") # 'class_'
field_name("for") # 'for_'
You can supply your own reserved set — useful for ORM column names, dataframe columns, or framework-reserved attributes:
field_name("id", reserved_words={"id", "type"})
# 'id_'
Duplicate handling
field_names guarantees unique outputs. Suffix style follows the chosen
naming style so the output stays consistent:
field_names(["Part Number", "Part Number", "Part Number"])
# ['part_number', 'part_number_2', 'part_number_3']
field_names(["Part Number", "Part Number"], style="kebab")
# ['part-number', 'part-number-2']
field_names(["Part Number", "Part Number"], style="camel")
# ['partNumber', 'partNumber2']
field_names(["Part Number", "Part Number"], style="pascal")
# ['PartNumber', 'PartNumber2']
field_map returns a dictionary, so when the original label is repeated
the key is disambiguated with an occurrence marker — the values still follow
field_names uniqueness rules:
field_map(["Part Number", "Part Number", "Notes"])
# {
# 'Part Number': 'part_number',
# 'Part Number (2)': 'part_number_2',
# 'Notes': 'notes',
# }
API surface
labelsmith.field_name(label, *, style="snake", prefix="field",
allow_leading_digit=False, reserved_words=None) -> str
labelsmith.field_names(labels, *, style="snake", prefix="field",
allow_leading_digit=False, reserved_words=None) -> list[str]
labelsmith.field_map(labels, *, style="snake", prefix="field",
allow_leading_digit=False, reserved_words=None) -> dict[str, str]
LabelSmith ships with a py.typed marker so type checkers will read the
inline annotations directly from the installed package.
Development
pip install -e ".[dev]"
python -m pytest
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file labelsmith-0.1.1.tar.gz.
File metadata
- Download URL: labelsmith-0.1.1.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d25ac8d54bc4b3e041032fc0b5a9609cf85ba8e0d85ab5d66f2e360857eb5a9
|
|
| MD5 |
63103c322abe21f1af5fc2cff7f438ef
|
|
| BLAKE2b-256 |
cf235cce445a6b64d42f3fe2461439bcb6190b8c0685f98ff85648a156e9b0b2
|
File details
Details for the file labelsmith-0.1.1-py3-none-any.whl.
File metadata
- Download URL: labelsmith-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e0311eefccd882d24ce8571dd730da724801f8e0720acd57b2528a06484c9ba
|
|
| MD5 |
1e344835cf757c697e7a2676bb907f98
|
|
| BLAKE2b-256 |
e6f0d4e877d4451f6fb6cd511e5372cb267bb8da79614c2c426eaeb90a84757c
|