Skip to main content

Clustering of association rules based on user defined thresholds.

Project description

coar

coar is implementation of clustering of association rules based on user defined thresholds.

Installation

Use the package manager pip to install coar.

pip install coar

Usage

Usage is displayed on association rules mined using Cleverminer using modified version of CleverMiner quickstart example. You need to install cleverminer first.

pip install cleverminer

Mining association rules using cleverminer:

# imports
import json
import pandas as pd
from cleverminer import cleverminer

# getting the source file
df = pd.read_csv(
    'https://www.cleverminer.org/hotel.zip', 
    encoding='cp1250', 
    sep='\t'
)

# selecting the columns
df = df[['VTypeOfVisit', 'GState', 'GCity']]


# mining association rules
clm = cleverminer(
    df=df, proc='4ftMiner',
    quantifiers={'conf': 0.6, 'Base': 50},
    ante={
        'attributes': [
            {'name': 'GState', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
            {'name': 'GCity', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
        ], 'minlen': 1, 'maxlen': 2, 'type': 'con'},
    succ={
        'attributes': [
            {'name': 'VTypeOfVisit', 'type': 'subset', 'minlen': 1, 'maxlen': 1}
        ], 'minlen': 1, 'maxlen': 1, 'type': 'con'},
)

# saving rules to file
with open('rules.json', 'w') as save_file:
    save_file.write(json.dumps(clm.rulelist))

Clustering rules using coar:

# imports
import json
import pandas as pd

from coar.cluster import agglomerative_clustering, cluster_representative


# loading rules
rule_file = open('rules.json')
rule_list = json.loads(rule_file.read())

# creating dataframe
df = pd.DataFrame.from_records([{
    'antecedent': set(attr for attr in rule['cedents_str']['ante'].split(' & ')),
    'succedent': set(attr for attr in rule['cedents_str']['succ'].split(' & ')),
    'support': rule['params']['rel_base'],
    'confidence': rule['params']['conf']
} for rule in rule_list])

# clustering
clustering = agglomerative_clustering(
    df,
    abs_ante_attr_diff_threshold=1,
    abs_succ_attr_diff_threshold=0,
    abs_supp_diff_threshold=1,
    abs_conf_diff_threshold=1,
)

# getting cluster representatives
clusters_repr = cluster_representative(clustering)

Contributing

If you find a bug 🐛, please open a bug report. If you have an idea for an improvement, new feature or enhancement 🚀, please open a feature request.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coar-1.22.tar.gz (6.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page