Skip to main content

Subgroup discovery and visualisation for python based on the VIKAMINE kernel

Project description

SD4Py

  • License: LGPL (GNU Lesser General Public License v3.0)
  • © 2021-2023 Dan Hudson, Martin Atzmueller - UOS, Semantic Information Systems Group

SD4Py is a package that makes it easy to perform subgroup discovery on tabular data. It is extremely simple to use. Call the sd4py.discover_subgroups() function on a Pandas dataframe and a collection of subgroups will be returned.

This package provides a Python interface for using the Java application VIKAMINE.

Before installing

SD4Py provides an interface to the Java application VIKAMINE. This means that Java and the Java Development Kit (JDK) must be installed before installing SD4Py.

Quick overview of subgroup discovery

Subgroup discovery is based on finding patterns within some (explanatory) columns of data that then help to explain another (target) column of data. The goal of the subgroup discovery process will be to understand in what circumstances the target is extreme. With a numeric target, this means finding circumstances in which the value is exceptionally high (or exceptionally low) on average. For a non-numeric target, this means looking for circumstances when a particular value is especially likely to occur. One of the key benefits of this approach is that the outputs are interpretable, being expressed as a readable combination of rules like (e.g.) "'Temperature'=high AND 'Pressure'=low".

One thing to note when performing the analysis is that subgroup discovery supports an iterative approach. This means that the subgroup discovery process can be run to obtain initial subgroups, which then might suggest changes like adding or removing variables or refining the search parameters, before re-running the process.

The outputs of the process are discovered subgroups that help to explain the target.

Important note on dependencies

There are two important dependencies for SD4Py. The first is JPype, which is used to run the JRE behind the scenes (although the user does not need to interact with JPype). In order to get JPype to work, you must first install Java and the Java Development Kit (JDK). The second is Pandas, which is used to store and manipulate tabular data. The data you want to use must be in a Pandas dataframe.

Basic usage example

Please see the basic usage example provided as a notebook on the SD4Py GitHub page https://github.com/cslab-hub/sd4py to see how to get started quickly.

Detailed documentation

Documentation can be accessed by using the help function within Python, or from the website https://cslab-hub.github.io/sd4py.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sd4py-0.1.7.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

SD4Py-0.1.7-py3-none-any.whl (2.6 MB view details)

Uploaded Python 3

File details

Details for the file sd4py-0.1.7.tar.gz.

File metadata

  • Download URL: sd4py-0.1.7.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for sd4py-0.1.7.tar.gz
Algorithm Hash digest
SHA256 0e91879389e33f482d6d8e9405109b042b05f1b874b1599928de6eb77ad8e0d9
MD5 cb09074e80163bdb2eb1bda0ac68447f
BLAKE2b-256 e1e0bd62dab800252ad65681ad399d52ccc7d2bbf1908511fba0b42be66868fc

See more details on using hashes here.

File details

Details for the file SD4Py-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: SD4Py-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for SD4Py-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 c5b913ad5d16680a6c07d901e6ac5486c47ed50864db1d3cd42284f32ada320c
MD5 cbcba64e7f508a269e640e72734bd4ba
BLAKE2b-256 a8fee34eace05f0532d827ca8af3080e8c0ec00b76ebb8a0c07bedfd8185c175

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page