A simple and easy to use Data Quality (DQ) tool built with Python.
Project description
Tiny Tim
A simple and easy to use Data Quality (DQ) tool built with Python.
Tiny Tim
uses the Python bindings for Polars a Rust
based DataFrame tool.
Support includes ...
polars
pandas
pyspark
csv
filesparquet
files
Both dataframe
and file
support. Simply "point and shoot."
Usage
You can pass Tiny Tim a dataframe
while specificy what type it is (pandas
, polars
, pyspark
)
and ask for default_checks
, also you can simply pass a file uri to a csv
or parquet
file.
You can also pass custom DQ checks in the form of SQL
statements that would be found
in a nomral WHERE
clause. Results of your checks are returned as a Polars
dataframe.
Current functionality ...
default_checks()
- check all columns for
null
values - check if dataset is distinct or contains duplicates
- check all columns for
run_custom_check("{some SQL WHERE clause})
Example Usage
CSV
support.
tm = TinyTim(source_type="csv", file_path="202306-divvy-tripdata.csv")
results = tm.default_checks()
>> Column start_station_name has 978 null values
>> Column start_station_id has 978 null values
>> Column end_station_name has 978 null values
>> Column end_station_id has 978 null values
>> Your dataset has 45 duplicates
Pandas
support.
df = pd.read_csv("202306-divvy-tripdata.csv")
tm = TinyTim(source_type="pandas", dataframe=df)
results = tm.default_checks()
>> Column start_station_name has 978 null values
>> Column start_station_id has 978 null values
>> Column end_station_name has 978 null values
>> Column end_station_id has 978 null values
>> Your dataset has no duplicates
Custom
Data Quality checks are supported in a SQL
based format.
They are given as they would appear in a WHERE
clause.
tm = TinyTim(source_type="csv", file_path="202306-divvy-tripdata.csv")
tm.default_checks()
results = tm.run_custom_check("start_station_name IS NULL")
>> Your custom check found 978 records that match your filter statement
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tinytimmy-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a31b7bdf59dcfa4149222c11e8eaaaade3026c7f1dcf7bb95d7fe150d977ebd8 |
|
MD5 | dc43c396cd3b9750cabc1d369b8cd577 |
|
BLAKE2b-256 | 10a6882c3e3237d1806c0ff00ca6e8d2d12ed586cc15ec6b576fd6f02a3a8c1d |