A smart tool for preprocessing messy tabular data.
Project description
PyNorma
This project is currently under construction
"You gotta do it, you can do it, but you just don't wanna do it."
PyNorma is a Python library that provides insights and automation for preprocessing messy, real-world tabular data. It's designed for data scientists, analysts, and anyone who's tired of the tedious task of cleaning up unstructured spreadsheets.
Key Features
- Smart Table Detection: Automatically finds the main data table within messy Excel or CSV files, ignoring surrounding comments and empty spaces.
- Advanced Preprocessing: Includes powerful tools like:
Flattener: Converts wide, multi-level header tables into a tidy, long format.Atomizer: Splits cells with multiple values into distinct rows or columns.Clarifier: Standardizes data based on a custom dictionary.- ...and more.
- Developer-Friendly: Designed by a lazy developer for lazy (but smart) developers.
Installation
pip install pynorma
Quickstart
Here's a simple example of reading a messy Excel file and automatically trimming it to the core data table.
from pynorma.io import parser
from pynorma.preprocessor import trimmer
# 1. Parse the file - PyNorma automatically detects the file type.
raw_df = parser.parse("examples/townbusiness1.csv")
# 2. Automatically trim the dataframe to the main table area.
clean_df = trimmer.trim_dataframe(raw_df, trim_mode="auto")
print("Successfully cleaned the dataframe!")
print(clean_df.head())
Author
nash-dir (https://github.com/nash-dir)
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pynorma-1.0.0a1.tar.gz.
File metadata
- Download URL: pynorma-1.0.0a1.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1109703f2a1b8028fd3d6b3032177f5abbe1b0b5ae1b2883c71ed020c35b20c2
|
|
| MD5 |
20339b6f0824d1e2b3d1fe30cbccb9ec
|
|
| BLAKE2b-256 |
8e98922bc619b104e9227629aa9ae0c4dcefd26efe9b0b169533f36b857f9288
|
File details
Details for the file pynorma-1.0.0a1-py3-none-any.whl.
File metadata
- Download URL: pynorma-1.0.0a1-py3-none-any.whl
- Upload date:
- Size: 21.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
766dae0e8a541d58541a5006a98391d0f1eb1ea8398cd22c5367f27ca90f0c28
|
|
| MD5 |
3ecb1853277ec1846aed145d1407c9f7
|
|
| BLAKE2b-256 |
e0f9789c19fa6131082daec1cc454f03f2bb1bc5887028cea89f11e60a3ff350
|