Skip to main content

Clustering of association rules based on user defined thresholds.

Project description

coar

coar is implementation of clustering of association rules based on user defined thresholds.

Installation

Use the package manager pip to install coar.

pip install coar

Usage

Usage is displayed on association rules mined using Cleverminer using modified version of CleverMiner quickstart example. You need to install cleverminer first.

pip install cleverminer

Mining association rules using cleverminer:

# imports
import json
import pandas as pd
from cleverminer import cleverminer

# getting the source file
df = pd.read_csv(
    'https://www.cleverminer.org/hotel.zip', 
    encoding='cp1250', 
    sep='\t'
)

# selecting the columns
df = df[['VTypeOfVisit', 'GState', 'GCity']]


# mining association rules
clm = cleverminer(
    df=df, proc='4ftMiner',
    quantifiers={'conf': 0.6, 'Base': 50},
    ante={
        'attributes': [
            {'name': 'GState', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
            {'name': 'GCity', 'type': 'subset', 'minlen': 1, 'maxlen': 1},
        ], 'minlen': 1, 'maxlen': 2, 'type': 'con'},
    succ={
        'attributes': [
            {'name': 'VTypeOfVisit', 'type': 'subset', 'minlen': 1, 'maxlen': 1}
        ], 'minlen': 1, 'maxlen': 1, 'type': 'con'},
)

# saving rules to file
with open('rules.json', 'w') as save_file:
    save_file.write(json.dumps(clm.rulelist))

Clustering rules using coar:

# imports
import json
import pandas as pd

from coar.cluster import agglomerative_clustering, cluster_representative


# loading rules
rule_file = open('rules.json')
rule_list = json.loads(rule_file.read())

# creating dataframe
df = pd.DataFrame.from_records([{
    'antecedent': set(attr for attr in rule['cedents_str']['ante'].split(' & ')),
    'succedent': set(attr for attr in rule['cedents_str']['succ'].split(' & ')),
    'support': rule['params']['rel_base'],
    'confidence': rule['params']['conf']
} for rule in rule_list])

# clustering
clustering = agglomerative_clustering(
    df,
    abs_ante_attr_diff_threshold=1,
    abs_succ_attr_diff_threshold=0,
    abs_supp_diff_threshold=1,
    abs_conf_diff_threshold=1,
)

# getting cluster representatives
clusters_repr = cluster_representative(clustering)

Contributing

If you find a bug 🐛, please open a bug report. If you have an idea for an improvement, new feature or enhancement 🚀, please open a feature request.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coar-1.21.tar.gz (6.5 kB view details)

Uploaded Source

File details

Details for the file coar-1.21.tar.gz.

File metadata

  • Download URL: coar-1.21.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for coar-1.21.tar.gz
Algorithm Hash digest
SHA256 11e72557cec3312cc34331f21469a4a9915a814a3dd6a292cd5a3a2b15458429
MD5 2fedefb9a934f7dc1043d8d3bfd779b7
BLAKE2b-256 e892cb6496de6f29e175190d744c5741616aa3cd72972d962473fe033a5723b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page