Skip to main content

Infer column names, data types, schema, and CREATE TABLE/VIEW DDL from a file or a pandas DataFrame — Pandas/ANSI SQL or PySpark/Spark SQL.

Project description

pyde_toolkit

Infer column names, data types, schema definitions, and CREATE TABLE / CREATE VIEW DDL from a CSV/TSV/Excel file — or directly from a pandas DataFrame already in memory (including a Spark DataFrame converted via .toPandas()). Outputs either Pandas/ANSI SQL or PySpark/Spark SQL, with optional Databricks medallion-layer (bronze/silver/gold) support.

Installation

pip install pyde_toolkit

Reading Excel files needs the optional extras:

pip install "pyde_toolkit[excel]"

Not yet on PyPI? See Building & Publishing below to build and install it locally first.

Quick Start — pass a DataFrame directly

This is the primary intended use case: no file I/O, just hand it a DataFrame.

import pandas as pd
from pyde_toolkit import infer_file

df = pd.DataFrame({
    "Plant Description": ["Mumbai Plant", "Pune Plant"],
    "ZODI/ZLDI":          ["ZODI", "ZLDI"],
    "Cost %":              [12.5, 8.0],
})

result = infer_file(df, pyspark=True, casing="snake", table_name="plant_master")

print(result["schema"])        # PySpark StructType, ready to paste
print(result["create_table"])  # CREATE TABLE ... USING DELTA
print(result["rename_code"])   # df.withColumnRenamed(...) snippet

Works the same way from a Spark DataFrame in a Databricks notebook:

result = infer_file(spark_df.toPandas(), pyspark=True, casing="snake",
                     table_name="sales_fact", layer="silver", catalog="prod")

Quick Start — pass a file path

result = infer_file("Sales1.csv", casing="pascal")   # Pandas + ANSI SQL by default

Command line

The same engine is also available as a CLI, installed as pyde_toolkit:

pyde_toolkit Sales1.csv
pyde_toolkit Sales1.csv --pyspark true --case pascal
pyde_toolkit Sales1.csv --pyspark true --layer all --catalog prod
pyde_toolkit --help

Full documentation

See docs/USAGE.md for the complete reference: every flag/parameter, casing rules, type-inference behaviour, sampling, medallion layers, table types, and the full infer_file() return value.

Building & Publishing

This repo is set up as a standard pyproject.toml package, so it can be built and installed without needing PyPI:

# Install locally, editable (changes to source take effect immediately)
pip install -e .

# Or build a wheel/sdist you can distribute internally
pip install build
python -m build              # creates dist/*.whl and dist/*.tar.gz
pip install dist/pyde_toolkit-1.0.0-py3-none-any.whl

To publish to PyPI so it's installable via a plain pip install pyde_toolkit, you'll need your own PyPI account/API token, then:

pip install twine
twine upload dist/*

(Double-check the name pyde_toolkit isn't already taken on PyPI before publishing — rename it in pyproject.toml if it is.)

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyde_toolkit-1.0.0.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyde_toolkit-1.0.0-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file pyde_toolkit-1.0.0.tar.gz.

File metadata

  • Download URL: pyde_toolkit-1.0.0.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for pyde_toolkit-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2469b8b93ffbc4756b8eeb2832f64820a7b18e2e429b29834ba508e917a04ea5
MD5 167e6da3e131ad36886534c2c36edee6
BLAKE2b-256 333046e8526cb000671f15ef3c0f69e0f9762d6f25eed200bc4ae0adfdecd577

See more details on using hashes here.

File details

Details for the file pyde_toolkit-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pyde_toolkit-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for pyde_toolkit-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 065ddad876eaceb5002c9c2c9fccf38537833a5d8a501fb0da430064bfc9b36b
MD5 84e9e1a0c7363bb96be746857a995d6f
BLAKE2b-256 f923e0f3634b485775eb569f82448ccd27f4803f34d240afed59a86957705295

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page