Skip to main content

Turn messy human labels into clean, consistent, code-safe field names.

Project description

LabelSmith

Turn messy human labels into clean, consistent, code-safe field names.

LabelSmith takes the kind of strings that show up on real-world spreadsheets, form captions, checksheets, and PDF tables — "Part Number", "Op. #2 (mm)", "Café — naïve" — and converts them into deterministic identifiers your code can rely on.

It is intentionally small. No AI, no LLM calls, no Excel or PDF parsing — just a focused, well-tested core for naming things.

Install

pip install labelsmith

LabelSmith runs on Python 3.10+ and depends only on the standard library.

Quick start

from labelsmith import field_name, field_names, field_map

field_name("Part Number")
# 'part_number'

field_names(["Part Number", "Part Number", "Op. #2"])
# ['part_number', 'part_number_2', 'op_2']

field_map(["Part Number", "Part Number"])
# {'Part Number': 'part_number', 'Part Number (2)': 'part_number_2'}

Styles

LabelSmith supports four output styles:

Style Example output
snake part_number
camel partNumber
pascal PartNumber
kebab part-number
field_name("Part Number")                     # 'part_number'
field_name("Part Number", style="camel")      # 'partNumber'
field_name("Part Number", style="pascal")     # 'PartNumber'
field_name("Part Number", style="kebab")      # 'part-number'

Any other value for style raises ValueError (labelsmith.UnsupportedStyleError).

Acronyms in camelCase and PascalCase

All-uppercase tokens are preserved as acronyms in camel and pascal styles, so manufacturing/checksheet labels with industry-standard acronyms stay recognizable:

field_name("AIAG/VDA Severity", style="pascal")     # 'AIAGVDASeverity'
field_name("AIAG/VDA Severity", style="camel")      # 'aiagVDASeverity'
field_name("PFMEA Cause(s)", style="pascal")        # 'PFMEACauseS'
field_name("N Gage Length (MACH)", style="pascal")  # 'NGageLengthMACH'
field_name("HTTPResponseCode", style="pascal")      # 'HTTPResponseCode'

camelCase always lowercases the first token, even when it's an acronym:

field_name("AIAG", style="camel")   # 'aiag'
field_name("AIAG", style="pascal")  # 'AIAG'

snake and kebab always lowercase every token, so acronym handling doesn't apply there:

field_name("AIAG/VDA Severity", style="snake")  # 'aiag_vda_severity'
field_name("AIAG/VDA Severity", style="kebab")  # 'aiag-vda-severity'

Cleaning behavior

LabelSmith trims whitespace, decomposes Unicode to ASCII where reasonable, splits on punctuation, symbols, and case boundaries, then re-joins using the requested style.

field_name("  Café — Naïve  ")        # 'cafe_naive'
field_name("Op. #2 (mm)")             # 'op_2_mm'
field_name("HTTPResponseCode")        # 'http_response_code'
field_name("first/second-third")      # 'first_second_third'

If a label normalizes to nothing, you get the prefix rendered in the chosen style. The default prefix is "field", so:

field_name("")                       # 'field'
field_name("***")                    # 'field'
field_name("", prefix="col")         # 'col'
field_name("", style="pascal")       # 'Field'
field_name("", style="kebab", prefix="my field")   # 'my-field'
field_name("", style="camel", prefix="my field")   # 'myField'

If prefix itself is empty or contains no usable alphanumeric content ("", "_", "---", whitespace), LabelSmith falls back to "field" so you never get back an unusable identifier:

field_name("", prefix="")        # 'field'
field_name("", prefix="_")       # 'field'
field_name("***", prefix="---")  # 'field'

Labels that start with a digit

By default, names that would start with a digit get the configured prefix woven in using the chosen style, so the result is a safe identifier and stays consistent with the style you asked for:

field_name("123 Part Number", style="snake")    # 'field_123_part_number'
field_name("123 Part Number", style="kebab")    # 'field-123-part-number'
field_name("123 Part Number", style="camel")    # 'field123PartNumber'
field_name("123 Part Number", style="pascal")   # 'Field123PartNumber'

Opt out with allow_leading_digit=True, or supply a different prefix:

field_name("1st Place", allow_leading_digit=True)    # '1_st_place'
field_name("1st", prefix="col")                      # 'col_1_st'
field_name("1st", prefix="col", style="kebab")       # 'col-1-st'

If prefix is empty or contains no usable alphanumeric content ("", "_", "---", whitespace), LabelSmith falls back to "field" so the result is always a safe identifier:

field_name("123 Part", prefix="")        # 'field_123_part'
field_name("123 Part", prefix="---", style="kebab")   # 'field-123-part'

Multi-token prefixes are tokenized and re-styled along with the label, so the whole result stays consistent:

field_name("123 Part Number", prefix="my field", style="camel")
# 'myField123PartNumber'
field_name("123 Part Number", prefix="my field", style="pascal")
# 'MyField123PartNumber'

Reserved words

Names that collide with Python reserved keywords get a trailing underscore so they remain usable as identifiers:

field_name("class")     # 'class_'
field_name("for")       # 'for_'

You can supply your own reserved set — useful for ORM column names, dataframe columns, or framework-reserved attributes:

field_name("id", reserved_words={"id", "type"})
# 'id_'

Duplicate handling

field_names guarantees unique outputs. Suffix style follows the chosen naming style so the output stays consistent:

field_names(["Part Number", "Part Number", "Part Number"])
# ['part_number', 'part_number_2', 'part_number_3']

field_names(["Part Number", "Part Number"], style="kebab")
# ['part-number', 'part-number-2']

field_names(["Part Number", "Part Number"], style="camel")
# ['partNumber', 'partNumber2']

field_names(["Part Number", "Part Number"], style="pascal")
# ['PartNumber', 'PartNumber2']

field_map returns a dictionary, so when the original label is repeated the key is disambiguated with an occurrence marker — the values still follow field_names uniqueness rules:

field_map(["Part Number", "Part Number", "Notes"])
# {
#     'Part Number': 'part_number',
#     'Part Number (2)': 'part_number_2',
#     'Notes': 'notes',
# }

API surface

labelsmith.field_name(label, *, style="snake", prefix="field",
                      allow_leading_digit=False, reserved_words=None) -> str

labelsmith.field_names(labels, *, style="snake", prefix="field",
                       allow_leading_digit=False, reserved_words=None) -> list[str]

labelsmith.field_map(labels, *, style="snake", prefix="field",
                     allow_leading_digit=False, reserved_words=None) -> dict[str, str]

LabelSmith ships with a py.typed marker so type checkers will read the inline annotations directly from the installed package.

Development

pip install -e ".[dev]"
python -m pytest

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labelsmith-0.1.1.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

labelsmith-0.1.1-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file labelsmith-0.1.1.tar.gz.

File metadata

  • Download URL: labelsmith-0.1.1.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for labelsmith-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6d25ac8d54bc4b3e041032fc0b5a9609cf85ba8e0d85ab5d66f2e360857eb5a9
MD5 63103c322abe21f1af5fc2cff7f438ef
BLAKE2b-256 cf235cce445a6b64d42f3fe2461439bcb6190b8c0685f98ff85648a156e9b0b2

See more details on using hashes here.

File details

Details for the file labelsmith-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: labelsmith-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for labelsmith-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6e0311eefccd882d24ce8571dd730da724801f8e0720acd57b2528a06484c9ba
MD5 1e344835cf757c697e7a2676bb907f98
BLAKE2b-256 e6f0d4e877d4451f6fb6cd511e5372cb267bb8da79614c2c426eaeb90a84757c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page