Skip to main content

Mine for functional dependencies in dataframes with polars

Project description

FDToolDF

This is fork of https://github.com/kristian10007/FDTool which is fork of https://github.com/USEPA/FDTool

This fork introduces:

  • Functionality of using pandas and polars dataframes as inputs.
  • Better logging.
  • (Experiments planned) Multithreaded search optimization.

Usage

# in cli
!pip install fdtooldf

# in jupyter notebook
from fdtooldf.runner import run_fdtool
import seaborn as sns

df = sns.load_dataset("tips")  # just to demonstrate
result = run_fdtool(df)  # result have two elements - [str to print, real_containers]

print(result[0])
# >>> FD (functional dependancies):
# total_bill | day -> size
# total_bill | day -> time
# total_bill | size -> time
# total_bill | tip -> sex
# total_bill | tip -> size
# total_bill | tip -> time
# total_bill | smoker | size -> day
# total_bill | tip | day -> smoker
# total_bill | tip | smoker -> day
# total_bill | sex | smoker | day -> tip

# >>> EQ (equivalences):
# total_bill | smoker | size <-> total_bill | smoker | day
# total_bill | tip | day <-> total_bill | tip | smoker

# >>> CK (candidate keys):
# total_bill | tip | day
# total_bill | tip | smoker
# total_bill | sex | smoker | day
# total_bill | sex | smoker | size



result[1]
# {'FD': frozenset({(frozenset({'day', 'total_bill'}), 'time'),
#             (frozenset({'size', 'total_bill'}), 'time'),
#             (frozenset({'size', 'smoker', 'total_bill'}), 'day'),
#             (frozenset({'tip', 'total_bill'}), 'size'),
#             (frozenset({'day', 'total_bill'}), 'size'),
#             (frozenset({'day', 'sex', 'smoker', 'total_bill'}), 'tip'),
#             (frozenset({'tip', 'total_bill'}), 'sex'),
#             (frozenset({'tip', 'total_bill'}), 'time'),
#             (frozenset({'smoker', 'tip', 'total_bill'}), 'day'),
#             (frozenset({'day', 'tip', 'total_bill'}), 'smoker')}),
#  'EQ': frozenset({frozenset({frozenset({'day', 'tip', 'total_bill'}),
#                        frozenset({'smoker', 'tip', 'total_bill'})}),
#             frozenset({frozenset({'size', 'smoker', 'total_bill'}),
#                        frozenset({'day', 'smoker', 'total_bill'})})}),
#  'CK': frozenset({frozenset({'day', 'sex', 'smoker', 'total_bill'}),
#             frozenset({'sex', 'size', 'smoker', 'total_bill'}),
#             frozenset({'day', 'tip', 'total_bill'}),
#             frozenset({'smoker', 'tip', 'total_bill'})})}

License

Notes: Module REPO/fdtooldf/modules/dbschema released under C-FSL license and copyright held by Elmar Stellnberger.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fdtooldf-0.0.1.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fdtooldf-0.0.1-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file fdtooldf-0.0.1.tar.gz.

File metadata

  • Download URL: fdtooldf-0.0.1.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.20

File hashes

Hashes for fdtooldf-0.0.1.tar.gz
Algorithm Hash digest
SHA256 1fc4e3a5ccf91dfa31cff70ecb5e1140c4ef5e99683cc70878b267486db454a8
MD5 c3159f954fbcce00b2260cf6c5531a53
BLAKE2b-256 f19aba1038029d7fba9823f708a99f23223782707f01c9b07b7f8c8dd750d0ab

See more details on using hashes here.

File details

Details for the file fdtooldf-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: fdtooldf-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.9.20

File hashes

Hashes for fdtooldf-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cc2f84f3779b199a0a80fef3da016826e62289d6a25c5e2437a9f43a42ce72ef
MD5 87fe5fce3b76bf20d4a5f80f739930f0
BLAKE2b-256 bb8b6d5a873bfee72a9b54523bac9432a5199cf671f2053964ed6819769d899a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page