Skip to main content

A smart tool for preprocessing messy tabular data.

Project description

PyNorma

This project is currently under construction

"You gotta do it, you can do it, but you just don't wanna do it."

PyNorma is a Python library that provides insights and automation for preprocessing messy, real-world tabular data. It's designed for data scientists, analysts, and anyone who's tired of the tedious task of cleaning up unstructured spreadsheets.

Key Features

  • Smart Table Detection: Automatically finds the main data table within messy Excel or CSV files, ignoring surrounding comments and empty spaces.
  • Advanced Preprocessing: Includes powerful tools like:
    • Flattener: Converts wide, multi-level header tables into a tidy, long format.
    • Atomizer: Splits cells with multiple values into distinct rows or columns.
    • Clarifier: Standardizes data based on a custom dictionary.
    • ...and more.
  • Developer-Friendly: Designed by a lazy developer for lazy (but smart) developers.

Installation

pip install pynorma

Quickstart

Here's a simple example of reading a messy Excel file and automatically trimming it to the core data table.

from pynorma.io import parser
from pynorma.preprocessor import trimmer

# 1. Parse the file - PyNorma automatically detects the file type.
raw_df = parser.parse("examples/townbusiness1.csv")

# 2. Automatically trim the dataframe to the main table area.
clean_df = trimmer.trim_dataframe(raw_df, trim_mode="auto")

print("Successfully cleaned the dataframe!")
print(clean_df.head())

Author

nash-dir (https://github.com/nash-dir)

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynorma-1.0.0a1.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pynorma-1.0.0a1-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file pynorma-1.0.0a1.tar.gz.

File metadata

  • Download URL: pynorma-1.0.0a1.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pynorma-1.0.0a1.tar.gz
Algorithm Hash digest
SHA256 1109703f2a1b8028fd3d6b3032177f5abbe1b0b5ae1b2883c71ed020c35b20c2
MD5 20339b6f0824d1e2b3d1fe30cbccb9ec
BLAKE2b-256 8e98922bc619b104e9227629aa9ae0c4dcefd26efe9b0b169533f36b857f9288

See more details on using hashes here.

File details

Details for the file pynorma-1.0.0a1-py3-none-any.whl.

File metadata

  • Download URL: pynorma-1.0.0a1-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pynorma-1.0.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 766dae0e8a541d58541a5006a98391d0f1eb1ea8398cd22c5367f27ca90f0c28
MD5 3ecb1853277ec1846aed145d1407c9f7
BLAKE2b-256 e0f9789c19fa6131082daec1cc454f03f2bb1bc5887028cea89f11e60a3ff350

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page