A growing toolkit of data-engineering helper functions and CLI commands — starting with schema inference (column standardisation, type inference, schema + DDL generation for Pandas/ANSI SQL or PySpark/Spark SQL).
Project description
pyde-toolkit
A growing toolkit of data-engineering helper functions and CLI commands. Each tool lives in its own submodule so the package can keep expanding without things colliding.
Tools currently included:
| Submodule | What it does |
|---|---|
pyde_toolkit.schema_inferencer |
Infers column names, data types, schema definitions, and CREATE TABLE/CREATE VIEW DDL from a CSV/TSV/Excel file or a pandas DataFrame already in memory. Outputs Pandas/ANSI SQL or PySpark/Spark SQL, with optional Databricks medallion-layer (bronze/silver/gold) support. |
Installation
pip install pyde-toolkit
Reading Excel files (for schema_inferencer) needs the optional extra:
pip install "pyde-toolkit[excel]"
Not yet on PyPI? See Building & Publishing below to build and install it locally first.
Quick Start — Schema Inferencer
Pass a DataFrame directly — no file I/O required:
import pandas as pd
from pyde_toolkit.schema_inferencer import infer_file
df = pd.DataFrame({
"Plant Description": ["Mumbai Plant", "Pune Plant"],
"ZODI/ZLDI": ["ZODI", "ZLDI"],
"Cost %": [12.5, 8.0],
})
result = infer_file(df, pyspark=True, casing="snake", table_name="plant_master")
print(result["schema"]) # PySpark StructType, ready to paste
print(result["create_table"]) # CREATE TABLE ... USING DELTA
print(result["rename_code"]) # df.withColumnRenamed(...) snippet
A top-level convenience import also works for the most common function:
from pyde_toolkit import infer_file
Works the same way from a Spark DataFrame in a Databricks notebook:
result = infer_file(spark_df.toPandas(), pyspark=True, casing="snake",
table_name="sales_fact", layer="silver", catalog="prod")
Or from a file path:
result = infer_file("Sales1.csv", casing="pascal") # Pandas + ANSI SQL by default
Command line
The package installs a single pyde-toolkit command. Each tool is a subcommand:
pyde-toolkit schema-infer Sales1.csv
pyde-toolkit schema-infer Sales1.csv --pyspark true --case pascal
pyde-toolkit schema-infer Sales1.csv --pyspark true --layer all --catalog prod
pyde-toolkit schema-infer --help
pyde-toolkit --version
Full documentation
docs/schema_inferencer.md— complete reference for the schema inferencer: every flag/parameter, casing rules, type-inference behaviour, sampling, medallion layers, table types, and the fullinfer_file()return value.docs/RELEASING.md— step-by-step checklist for making a change, bumping the version, building, publishing, and installing the upgrade.
(As more tools are added, each gets its own docs/<tool_name>.md.)
Adding a new tool to the toolkit
The package is structured so new tools drop in without touching existing ones:
- Create
src/pyde_toolkit/<your_tool>/with its owncore.py(the logic) andcli.pyexposing two functions:add_arguments(parser)to register its flags, andrun(args)to execute. Seeschema_inferencer/cli.pyfor the pattern. - In
src/pyde_toolkit/cli.py, register it as a new subcommand — onesubparsers.add_parser(...)call plusyour_tool_cli.add_arguments(...). Dispatch is generic, so nothing else needs to change. - Optionally re-export its main function from
src/pyde_toolkit/__init__.pyfor a top-level convenience import. - Add
docs/<your_tool>.mdandtests/<your_tool>/.
Building, releasing & installing upgrades
Quick version — see docs/RELEASING.md for the full checklist (versioning rules, publishing options, troubleshooting):
pip install -e ".[dev]" # 1. dev install
pytest # 2. test your changes
# bump version = "X.Y.Z" in pyproject.toml # 3. one-line version bump
rm -rf build dist src/*.egg-info
python -m build # 4. build dist/*.whl and dist/*.tar.gz
twine upload dist/* # 5. publish (PyPI or your private index)
pip install --upgrade pyde-toolkit # 6. install the new version
pyde_toolkit.__version__ and pyde-toolkit --version are read automatically from whatever version is installed — no need to edit any source file besides pyproject.toml.
License
MIT — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyde_toolkit-1.0.5.tar.gz.
File metadata
- Download URL: pyde_toolkit-1.0.5.tar.gz
- Upload date:
- Size: 23.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1daf30a74ff115ac17a4adce1ade384e0e25b523f06558d926910dd59e077ef0
|
|
| MD5 |
228e76bd1e5176c4a81a4eb45fefff51
|
|
| BLAKE2b-256 |
06e144de1f936e2b1ec9f81660853fb9fc94d2be3eaa6019c704257f5c49af9d
|
File details
Details for the file pyde_toolkit-1.0.5-py3-none-any.whl.
File metadata
- Download URL: pyde_toolkit-1.0.5-py3-none-any.whl
- Upload date:
- Size: 22.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e27c45251a5be896fa9f9ca589ccaa46c7e5a923a7b33d0eef44c6096bec0fa1
|
|
| MD5 |
017547adf5d470a0ae7a8c406a9518af
|
|
| BLAKE2b-256 |
5583b7ae5eb4993cbd97d4bc6d586bdc0a6e07feb363bfc74ddb677934396865
|