Skip to main content

Mine for functional dependencies in dataframes with polars

Project description

FDToolDF

This is fork of https://github.com/kristian10007/FDTool which is fork of https://github.com/USEPA/FDTool

This fork introduces:

  • Functionality of using pandas and polars dataframes as inputs.
  • Better logging.
  • (Experiments planned) Multithreaded search optimization.

Usage

# in cli
!pip install fdtooldf

# in jupyter notebook
from fdtooldf.runner import run_fdtool
import seaborn as sns

df = sns.load_dataset("tips")  # just to demonstrate
result = run_fdtool(df)  # result have two elements - [str to print, real_containers]

print(result[0])
# >>> FD (functional dependancies):
# total_bill tip -> sex
# total_bill day -> size
# total_bill day -> time
# total_bill tip -> size
# total_bill tip -> time
# total_bill size -> time
# total_bill tip day -> smoker
# total_bill tip smoker -> day
# total_bill smoker size -> day
# total_bill sex smoker day -> tip

# >>> EQ (equivalences):
# size smoker total_bill <-> smoker total_bill day
# tip total_bill day <-> tip smoker total_bill

# >>> CK (candidate keys):
# day tip total_bill
# smoker tip total_bill
# day sex smoker total_bill
# sex size smoker total_bill



result[1]
# {'FD': frozenset({(frozenset({'day', 'total_bill'}), 'size'),
#             (frozenset({'tip', 'total_bill'}), 'size'),
#             (frozenset({'size', 'total_bill'}), 'time'),
#             (frozenset({'tip', 'total_bill'}), 'sex'),
#             (frozenset({'tip', 'total_bill'}), 'time'),
#             (frozenset({'smoker', 'tip', 'total_bill'}), 'day'),
#             (frozenset({'day', 'total_bill'}), 'time'),
#             (frozenset({'size', 'smoker', 'total_bill'}), 'day'),
#             (frozenset({'day', 'sex', 'smoker', 'total_bill'}), 'tip'),
#             (frozenset({'day', 'tip', 'total_bill'}), 'smoker')}),
#  'EQ': frozenset({(frozenset({'size', 'smoker', 'total_bill'}),
#              frozenset({'day', 'smoker', 'total_bill'})),
#             (frozenset({'day', 'tip', 'total_bill'}),
#              frozenset({'smoker', 'tip', 'total_bill'}))}),
#  'CK': frozenset({frozenset({'day', 'tip', 'total_bill'}),
#             frozenset({'day', 'sex', 'smoker', 'total_bill'}),
#             frozenset({'sex', 'size', 'smoker', 'total_bill'}),
#             frozenset({'smoker', 'tip', 'total_bill'})})}

License

Notes: Module REPO/fdtooldf/modules/dbschema released under C-FSL license and copyright held by Elmar Stellnberger.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fdtooldf-0.0.0.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fdtooldf-0.0.0-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file fdtooldf-0.0.0.tar.gz.

File metadata

  • Download URL: fdtooldf-0.0.0.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.20

File hashes

Hashes for fdtooldf-0.0.0.tar.gz
Algorithm Hash digest
SHA256 8819dc54254447c16d6b3aab15977b4f731030bdd3146b9d2f5bc2c5cf918344
MD5 14e5e58fc3579d2682710d3890c6002c
BLAKE2b-256 26852db058e71ef963b1bbe4a62cf9d635247a1c916351953846bb928c9221a9

See more details on using hashes here.

File details

Details for the file fdtooldf-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: fdtooldf-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.20

File hashes

Hashes for fdtooldf-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52e749866f6218706323d13d896a7a4f152f15df0adb999ad70c1edd3512e97a
MD5 3a6cf28e2f59f067b7a6357212642c97
BLAKE2b-256 1a3e592a61cdc0085bb37e6741b954d0c2f12f2c6f01efced42017ee649e0f4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page