Mine for functional dependencies in dataframes with polars
Project description
FDToolDF
This is fork of https://github.com/kristian10007/FDTool which is fork of https://github.com/USEPA/FDTool
This fork introduces:
- Functionality of using pandas and polars dataframes as inputs.
- Better logging.
- (Experiments planned) Multithreaded search optimization.
Usage
# in cli
!pip install fdtooldf
# in jupyter notebook
from fdtooldf.runner import run_fdtool
import seaborn as sns
df = sns.load_dataset("tips") # just to demonstrate
result = run_fdtool(df) # result have two elements - [str to print, real_containers]
print(result[0])
# >>> FD (functional dependancies):
# total_bill | day -> size
# total_bill | day -> time
# total_bill | size -> time
# total_bill | tip -> sex
# total_bill | tip -> size
# total_bill | tip -> time
# total_bill | smoker | size -> day
# total_bill | tip | day -> smoker
# total_bill | tip | smoker -> day
# total_bill | sex | smoker | day -> tip
# >>> EQ (equivalences):
# total_bill | smoker | size <-> total_bill | smoker | day
# total_bill | tip | day <-> total_bill | tip | smoker
# >>> CK (candidate keys):
# total_bill | tip | day
# total_bill | tip | smoker
# total_bill | sex | smoker | day
# total_bill | sex | smoker | size
result[1]
# {'FD': frozenset({(frozenset({'day', 'total_bill'}), 'time'),
# (frozenset({'size', 'total_bill'}), 'time'),
# (frozenset({'size', 'smoker', 'total_bill'}), 'day'),
# (frozenset({'tip', 'total_bill'}), 'size'),
# (frozenset({'day', 'total_bill'}), 'size'),
# (frozenset({'day', 'sex', 'smoker', 'total_bill'}), 'tip'),
# (frozenset({'tip', 'total_bill'}), 'sex'),
# (frozenset({'tip', 'total_bill'}), 'time'),
# (frozenset({'smoker', 'tip', 'total_bill'}), 'day'),
# (frozenset({'day', 'tip', 'total_bill'}), 'smoker')}),
# 'EQ': frozenset({frozenset({frozenset({'day', 'tip', 'total_bill'}),
# frozenset({'smoker', 'tip', 'total_bill'})}),
# frozenset({frozenset({'size', 'smoker', 'total_bill'}),
# frozenset({'day', 'smoker', 'total_bill'})})}),
# 'CK': frozenset({frozenset({'day', 'sex', 'smoker', 'total_bill'}),
# frozenset({'sex', 'size', 'smoker', 'total_bill'}),
# frozenset({'day', 'tip', 'total_bill'}),
# frozenset({'smoker', 'tip', 'total_bill'})})}
License
Notes: Module REPO/fdtooldf/modules/dbschema released under C-FSL license and copyright held by Elmar Stellnberger.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fdtooldf-0.0.1.tar.gz
(13.1 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
fdtooldf-0.0.1-py3-none-any.whl
(16.0 kB
view details)
File details
Details for the file fdtooldf-0.0.1.tar.gz.
File metadata
- Download URL: fdtooldf-0.0.1.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1fc4e3a5ccf91dfa31cff70ecb5e1140c4ef5e99683cc70878b267486db454a8
|
|
| MD5 |
c3159f954fbcce00b2260cf6c5531a53
|
|
| BLAKE2b-256 |
f19aba1038029d7fba9823f708a99f23223782707f01c9b07b7f8c8dd750d0ab
|
File details
Details for the file fdtooldf-0.0.1-py3-none-any.whl.
File metadata
- Download URL: fdtooldf-0.0.1-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc2f84f3779b199a0a80fef3da016826e62289d6a25c5e2437a9f43a42ce72ef
|
|
| MD5 |
87fe5fce3b76bf20d4a5f80f739930f0
|
|
| BLAKE2b-256 |
bb8b6d5a873bfee72a9b54523bac9432a5199cf671f2053964ed6819769d899a
|