Skip to main content

Clustering of association rules based on user defined thresholds.

Project description

coar

coar is implementation of clustering of association rules based on user defined thresholds.

Installation

Use the package manager pip to install coar.

pip install coar

Usage

Usage is displayed on association rules mined using Cleverminer using modified version of CleverMiner quickstart example. You need to install cleverminer first.

pip install cleverminer

Mining association rules using cleverminer:

# imports
import json
import pandas as pd
from cleverminer import cleverminer

# getting the source file
df = pd.read_csv(
    'https://www.cleverminer.org/hotel.zip', 
    encoding='cp1250', 
    sep='\t'
)

# selecting the columns
df = df[['VTypeOfVisit', 'GState', 'GCity']]


# mining association rules
clm = cleverminer(
    df=df, proc='4ftMiner',
    quantifiers={'conf': 0.6, 'Base': 50},
    ante={
        'attributes': [
            {'name': 'GState', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
            {'name': 'GCity', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
        ], 'minlen': 1, 'maxlen': 2, 'type': 'con'},
    succ={
        'attributes': [
            {'name': 'VTypeOfVisit', 'type': 'subset', 'minlen': 1, 'maxlen': 1}
        ], 'minlen': 1, 'maxlen': 1, 'type': 'con'},
)

# saving rules to file
with open('rules.json', 'w') as save_file:
    save_file.write(json.dumps(clm.rulelist))

Clustering rules using coar:

# imports
import json
import pandas as pd

from coar.cluster import agglomerative_clustering, cluster_representative


# loading rules
rule_file = open('rules.json')
rule_list = json.loads(rule_file.read())

# creating dataframe
df = pd.DataFrame.from_records([{
    'antecedent': set(attr for attr in rule['cedents_str']['ante'].split(' & ')),
    'succedent': set(attr for attr in rule['cedents_str']['succ'].split(' & ')),
    'support': rule['params']['rel_base'],
    'confidence': rule['params']['conf']
} for rule in rule_list])

# clustering
clustering = agglomerative_clustering(
    df,
    abs_ante_attr_diff_threshold=1,
    abs_succ_attr_diff_threshold=0,
    abs_supp_diff_threshold=1,
    abs_conf_diff_threshold=1,
)

# getting cluster representatives
clusters_repr = cluster_representative(clustering)

Contributing

If you find a bug 🐛, please open a bug report. If you have an idea for an improvement, new feature or enhancement 🚀, please open a feature request.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coar-1.22.tar.gz (6.5 kB view details)

Uploaded Source

File details

Details for the file coar-1.22.tar.gz.

File metadata

  • Download URL: coar-1.22.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for coar-1.22.tar.gz
Algorithm Hash digest
SHA256 fa2dfc7a632ef71913b32893f20112d0938026f994d3793c2baf2a163b9d1413
MD5 7df31e7bf5bf84d235f763554cc7e59b
BLAKE2b-256 eed7cf153adc927d858d30762966dcc6f1e0addaeaa8c96710ee2ac85b616a6b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page