Project description

High-Utility Itemset Mining

Implements high-utility itemset mining algorithms.

High-utility itemset mining (HUIM) generalizes the problem of frequent itemset mining (FIM) by considering item values and weights. A popular application of HUIM is to discover all sets of items purchased together by customers that yield a high profit for the retailer. In such a case, item values would show not just that a load of bread is in a basket, but how many there are; and weights would include the profit from a loaf of bread.

More technically, HUIM requires transactions in the transactions "database" to have internal utilities (i.e. item values) associated with each item in each transaction and a "database" of external utilities for each item (i.e. weights).

Currently implemented algorithms:

Two-Phase (with max length support)
Liu Y., Liao W., Choudhary A. (2005) A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets. In: Ho T.B., Cheung D., Liu H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2005. Lecture Notes in Computer Science, vol 3518. Springer, Berlin, Heidelberg
Link: http://cucis.ece.northwestern.edu/publications/pdf/LiuLia05A.pdf

Roadmap (high to low priority):

Address low-correlation HUIs with one of bond, all-confidence, or affinity. Itemsets that are high utility, but where the items aren't correlated can be misleading for making marketing decisions. E.g. if an itemset of a TV and a pen is a HUI, it's likely just because the TV is expensive, not because it's an interesting pattern.
Add average utility measure support, for easier, more intuitive minutil
Support discount strategies via a discount strategy table and upgraded external utilities table.
Add top-k HUI support.
Support identifying periodic high utility itemsets. This allows detection of purchase patterns among high-utility itemsets to allow cross-promotions to customers who buy sets of items periodically.
Support items' on-shelf time. Ignmoring on-shelf time will biat toward items that have more shelf time, since they have more chance to generate higher utility.
Allow incremental transaction updates without rerunning everything.
Support concise HUI itemsets, specifically closed form. This allows the algorithm to be more efficient, only showing longer itemsets, which may be the most interesting ones (correlation issues aside).

Example:

    >>> from operator import attrgetter


    >>> transactions = [
            [("Coke 12oz", 6), ("Chips", 2), ("Dip", 1)],
            [("Coke 12oz", 1)],
            [("Coke 12oz", 2), ("Chips", 1)],
            [("Chips", 1)],
            [("Chips", 2)],
            [("Coke 12oz", 6), ("Chips", 1)]
        ]

    >>> # ARP for each item
    >>> external_utilities = {
            "Coke 12oz": 1.29,
            "Chips": 2.99,
            "Dip": 3.49
        }

    >>> # Minimum dollar value generated by an itemset we care about across all transactions
    >>> minutil = 20.00

    >>> hui = TwoPhase(transactions, external_utilities, minutil)
    >>> result = hui.get_hui()
    >>> # Default order of results is Alphabetically by length ascending for the itemset.
    >>> # However, it may be more desirable to rank based on utility descending, as below
    >>> sorted(result, key=attrgetter('itemset_utility'), reverse=True)
    [HUIRecord(items=frozenset({'Chips', 'Coke 12oz'}), itemset_utility=30.02),
     HUIRecord(items=frozenset({'Chips'}), itemset_utility=20.93)]

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.2.2

Apr 23, 2022

0.2.1

Oct 13, 2020

0.2.0

Aug 2, 2020

This version

0.1.0

Jul 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itemset_mining-0.1.0.tar.gz (11.7 kB view hashes)

Uploaded Jul 24, 2020 Source

Built Distribution

itemset_mining-0.1.0-py3-none-any.whl (8.2 kB view hashes)

Uploaded Jul 24, 2020 Python 3

Hashes for itemset_mining-0.1.0.tar.gz

Hashes for itemset_mining-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d274ad097451fab2e0ddb660f5df81f2521a2fb1ee8b67577107cec2afc990e7`
MD5	`38b3c602d5caa2ae7850283cd12ebef1`
BLAKE2b-256	`dc90ad94d47da52e6f8c09bf83e0aa19bb771c94b86e92ce516fc632fad297aa`

Hashes for itemset_mining-0.1.0-py3-none-any.whl

Hashes for itemset_mining-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`738816b1664b1ef1fd1c50fb887c295d0d1f32b5aa5e21931af241979f22d608`
MD5	`69c5d973aaa9916c5c228816d471e7fa`
BLAKE2b-256	`c07984a25655846f5e16e6a95e2a80b68aa1e2082cc9bb6cc40f9146ec7dd758`