Mine for functional dependencies in dataframes with polars
Project description
FDToolDF
This is fork of https://github.com/kristian10007/FDTool which is fork of https://github.com/USEPA/FDTool
This fork introduces:
- Functionality of using pandas and polars dataframes as inputs.
- Better logging.
- (Experiments planned) Multithreaded search optimization.
Usage
# in cli
!pip install fdtooldf
# in jupyter notebook
from fdtooldf.runner import run_fdtool
import seaborn as sns
df = sns.load_dataset("tips") # just to demonstrate
result = run_fdtool(df) # result have two elements - [str to print, real_containers]
print(result[0])
# >>> FD (functional dependancies):
# total_bill tip -> sex
# total_bill day -> size
# total_bill day -> time
# total_bill tip -> size
# total_bill tip -> time
# total_bill size -> time
# total_bill tip day -> smoker
# total_bill tip smoker -> day
# total_bill smoker size -> day
# total_bill sex smoker day -> tip
# >>> EQ (equivalences):
# size smoker total_bill <-> smoker total_bill day
# tip total_bill day <-> tip smoker total_bill
# >>> CK (candidate keys):
# day tip total_bill
# smoker tip total_bill
# day sex smoker total_bill
# sex size smoker total_bill
result[1]
# {'FD': frozenset({(frozenset({'day', 'total_bill'}), 'size'),
# (frozenset({'tip', 'total_bill'}), 'size'),
# (frozenset({'size', 'total_bill'}), 'time'),
# (frozenset({'tip', 'total_bill'}), 'sex'),
# (frozenset({'tip', 'total_bill'}), 'time'),
# (frozenset({'smoker', 'tip', 'total_bill'}), 'day'),
# (frozenset({'day', 'total_bill'}), 'time'),
# (frozenset({'size', 'smoker', 'total_bill'}), 'day'),
# (frozenset({'day', 'sex', 'smoker', 'total_bill'}), 'tip'),
# (frozenset({'day', 'tip', 'total_bill'}), 'smoker')}),
# 'EQ': frozenset({(frozenset({'size', 'smoker', 'total_bill'}),
# frozenset({'day', 'smoker', 'total_bill'})),
# (frozenset({'day', 'tip', 'total_bill'}),
# frozenset({'smoker', 'tip', 'total_bill'}))}),
# 'CK': frozenset({frozenset({'day', 'tip', 'total_bill'}),
# frozenset({'day', 'sex', 'smoker', 'total_bill'}),
# frozenset({'sex', 'size', 'smoker', 'total_bill'}),
# frozenset({'smoker', 'tip', 'total_bill'})})}
License
Notes: Module REPO/fdtooldf/modules/dbschema released under C-FSL license and copyright held by Elmar Stellnberger.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
fdtooldf-0.0.0.tar.gz
(13.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
fdtooldf-0.0.0-py3-none-any.whl
(15.8 kB
view details)
File details
Details for the file fdtooldf-0.0.0.tar.gz.
File metadata
- Download URL: fdtooldf-0.0.0.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8819dc54254447c16d6b3aab15977b4f731030bdd3146b9d2f5bc2c5cf918344
|
|
| MD5 |
14e5e58fc3579d2682710d3890c6002c
|
|
| BLAKE2b-256 |
26852db058e71ef963b1bbe4a62cf9d635247a1c916351953846bb928c9221a9
|
File details
Details for the file fdtooldf-0.0.0-py3-none-any.whl.
File metadata
- Download URL: fdtooldf-0.0.0-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.9.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52e749866f6218706323d13d896a7a4f152f15df0adb999ad70c1edd3512e97a
|
|
| MD5 |
3a6cf28e2f59f067b7a6357212642c97
|
|
| BLAKE2b-256 |
1a3e592a61cdc0085bb37e6741b954d0c2f12f2c6f01efced42017ee649e0f4d
|