Skip to main content

Add your description here

Project description

clean-column-names

Clean pandas DataFrame column names into predictable, consistent case styles.

Installation

uv add clean-column-names

Usage

import pandas as pd
import clean_column_names

df = pd.DataFrame(
    columns=[
        "First Name",
        "Café Sales ($)",
        "HTTPStatusCode",
        "",
        None,
        "First Name",
    ]
)

df = df.pipe(clean_column_names.clean_column_names)

print(df.columns.tolist())

Output:

[
    "first_name",
    "cafe_sales_$",
    "http_status_code",
    "column",
    "column_1",
    "first_name_1",
]

The original DataFrame is not modified.

API

df = df.pipe(
    clean_column_names.clean_column_names,
    case="snake",
    replace=None,
    remove_accents=True,
)

Arguments

df: A pandas DataFrame.

case: The target case style. Defaults to "snake".

replace: Optional mapping of literal text replacements to apply before case conversion. Matching is case-insensitive.

remove_accents: When True, accented characters are transliterated to ASCII where possible. Defaults to True.

Case Styles

case Example
"snake" column_name
"kebab" column-name
"camel" columnName
"pascal" ColumnName
"const" COLUMN_NAME
"sentence" Column name
"title" Column Name
"lower" column name
"upper" COLUMN NAME

Examples

Use kebab case:

df = df.pipe(
    clean_column_names.clean_column_names,
    case="kebab",
)

print(df.columns.tolist())
[
    "first-name",
    "cafe-sales-$",
    "http-status-code",
    "column",
    "column-1",
    "first-name-1",
]

Apply replacements before cleaning:

df = df.pipe(
    clean_column_names.clean_column_names,
    replace={"HTTP": "API"},
)

print(df.columns.tolist())
[
    "first_name",
    "cafe_sales_$",
    "api_status_code",
    "column",
    "column_1",
    "first_name_1",
]

Keep accented characters:

df = df.pipe(
    clean_column_names.clean_column_names,
    case="title",
    remove_accents=False,
)

print(df.columns.tolist())
[
    "First Name",
    "Café Sales ($)",
    "Http Status Code",
    "Column",
    "Column 1",
    "First Name 1",
]

Behavior Notes

Blank and null column names are converted to column.

If multiple columns clean to the same name, numeric suffixes are added using the target case style's separator:

df = pd.DataFrame(columns=["Name", "Name", "Name"])
df = df.pipe(clean_column_names.clean_column_names)

print(df.columns.tolist())
["name", "name_1", "name_2"]

This package supports ordinary flat columns and pandas MultiIndex columns.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clean_column_names-0.1.0.tar.gz (3.7 kB view details)

Uploaded Source

File details

Details for the file clean_column_names-0.1.0.tar.gz.

File metadata

  • Download URL: clean_column_names-0.1.0.tar.gz
  • Upload date:
  • Size: 3.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.12 {"installer":{"name":"uv","version":"0.11.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for clean_column_names-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2a275956aa45ca4c4285ec68751d1b76c7238dfb29c8267c43afab7028512289
MD5 2fafeb448ca46151faca15f3a473becf
BLAKE2b-256 8b867b540b5e9647225ee77c9d42198f7d0580c75ab3e6df1f569e69d2c355e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page