Infer column names, data types, schema, and CREATE TABLE/VIEW DDL from a file or a pandas DataFrame — Pandas/ANSI SQL or PySpark/Spark SQL.
Project description
pyde_toolkit
Infer column names, data types, schema definitions, and CREATE TABLE / CREATE VIEW DDL from a CSV/TSV/Excel file — or directly from a pandas DataFrame already in memory (including a Spark DataFrame converted via .toPandas()). Outputs either Pandas/ANSI SQL or PySpark/Spark SQL, with optional Databricks medallion-layer (bronze/silver/gold) support.
Installation
pip install pyde_toolkit
Reading Excel files needs the optional extras:
pip install "pyde_toolkit[excel]"
Not yet on PyPI? See Building & Publishing below to build and install it locally first.
Quick Start — pass a DataFrame directly
This is the primary intended use case: no file I/O, just hand it a DataFrame.
import pandas as pd
from pyde_toolkit import infer_file
df = pd.DataFrame({
"Plant Description": ["Mumbai Plant", "Pune Plant"],
"ZODI/ZLDI": ["ZODI", "ZLDI"],
"Cost %": [12.5, 8.0],
})
result = infer_file(df, pyspark=True, casing="snake", table_name="plant_master")
print(result["schema"]) # PySpark StructType, ready to paste
print(result["create_table"]) # CREATE TABLE ... USING DELTA
print(result["rename_code"]) # df.withColumnRenamed(...) snippet
Works the same way from a Spark DataFrame in a Databricks notebook:
result = infer_file(spark_df.toPandas(), pyspark=True, casing="snake",
table_name="sales_fact", layer="silver", catalog="prod")
Quick Start — pass a file path
result = infer_file("Sales1.csv", casing="pascal") # Pandas + ANSI SQL by default
Command line
The same engine is also available as a CLI, installed as pyde_toolkit:
pyde_toolkit Sales1.csv
pyde_toolkit Sales1.csv --pyspark true --case pascal
pyde_toolkit Sales1.csv --pyspark true --layer all --catalog prod
pyde_toolkit --help
Full documentation
See docs/USAGE.md for the complete reference: every flag/parameter, casing rules, type-inference behaviour, sampling, medallion layers, table types, and the full infer_file() return value.
Building & Publishing
This repo is set up as a standard pyproject.toml package, so it can be built and installed without needing PyPI:
# Install locally, editable (changes to source take effect immediately)
pip install -e .
# Or build a wheel/sdist you can distribute internally
pip install build
python -m build # creates dist/*.whl and dist/*.tar.gz
pip install dist/pyde_toolkit-1.0.0-py3-none-any.whl
To publish to PyPI so it's installable via a plain pip install pyde_toolkit, you'll need your own PyPI account/API token, then:
pip install twine
twine upload dist/*
(Double-check the name pyde_toolkit isn't already taken on PyPI before publishing — rename it in pyproject.toml if it is.)
License
MIT — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyde_toolkit-1.0.0.tar.gz.
File metadata
- Download URL: pyde_toolkit-1.0.0.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2469b8b93ffbc4756b8eeb2832f64820a7b18e2e429b29834ba508e917a04ea5
|
|
| MD5 |
167e6da3e131ad36886534c2c36edee6
|
|
| BLAKE2b-256 |
333046e8526cb000671f15ef3c0f69e0f9762d6f25eed200bc4ae0adfdecd577
|
File details
Details for the file pyde_toolkit-1.0.0-py3-none-any.whl.
File metadata
- Download URL: pyde_toolkit-1.0.0-py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
065ddad876eaceb5002c9c2c9fccf38537833a5d8a501fb0da430064bfc9b36b
|
|
| MD5 |
84e9e1a0c7363bb96be746857a995d6f
|
|
| BLAKE2b-256 |
f923e0f3634b485775eb569f82448ccd27f4803f34d240afed59a86957705295
|