A python package for itemset mining algorithms.
Project description
Itemset Mining
Implements itemset mining algorithms.
Algorithms
High-utility itemset mining (HUIM)
HUIM generalizes the problem of frequent itemset mining (FIM) by considering item values and weights. A popular application of HUIM is to discover all sets of items purchased together by customers that yield a high profit for the retailer. In such a case, item values would show not just that a load of bread is in a basket, but how many there are; and weights would include the profit from a loaf of bread.
More technically, HUIM requires transactions in the transactions "database" to have internal utilities (i.e. item values) associated with each item in each transaction and a "database" of external utilities for each item (i.e. weights).
Algorithm | Class | How to Cite |
---|---|---|
Two-Phase* | itemset_mining.two_phase_huim.TwoPhase | Link |
* Includes max length support
Roadmap (high to low priority):
- Address low-correlation HUIs with one of bond, all-confidence, or affinity. Itemsets that are high utility, but where the items aren't correlated can be misleading for making marketing decisions. E.g. if an itemset of a TV and a pen is a HUI, it's likely just because the TV is expensive, not because it's an interesting pattern.
- Add average utility measure support, for easier, more intuitive minutil
- Support discount strategies via a discount strategy table and upgraded external utilities table.
- Add top-k HUI support.
- Support identifying periodic high utility itemsets. This allows detection of purchase patterns among high-utility itemsets to allow cross-promotions to customers who buy sets of items periodically.
- Support items' on-shelf time. Ignmoring on-shelf time will biat toward items that have more shelf time, since they have more chance to generate higher utility.
- Allow incremental transaction updates without rerunning everything.
- Support concise HUI itemsets, specifically closed form. This allows the algorithm to be more efficient, only showing longer itemsets, which may be the most interesting ones (correlation issues aside).
Installation:
pip install itemset-mining
Example:
>>> from operator import attrgetter
>>> from itemset_mining.two_phase_huim import TwoPhase
>>> transactions = [
... [("Coke 12oz", 6), ("Chips", 2), ("Dip", 1)],
... [("Coke 12oz", 1)],
... [("Coke 12oz", 2), ("Chips", 1), ("Filet Mignon 1lb", 1)],
... [("Chips", 1)],
... [("Chips", 2)],
... [("Coke 12oz", 6), ("Chips", 1)]
... ]
>>> # ARP for each item
>>> external_utilities = {
... "Coke 12oz": 1.29,
... "Chips": 2.99,
... "Dip": 3.49,
... "Filet Mignon 1lb": 22.99
... }
>>> # Minimum dollar value generated by an itemset we care about across all transactions
>>> minutil = 20.00
>>> hui = TwoPhase(transactions, external_utilities, minutil)
>>> result = hui.get_hui()
>>> sorted(result, key=attrgetter('itemset_utility'), reverse=True)
... # doctest: +NORMALIZE_WHITESPACE
[HUIRecord(items=('Chips', 'Coke 12oz'), itemset_utility=30.02),
HUIRecord(items=('Chips', 'Coke 12oz', 'Filet Mignon 1lb'), itemset_utility=28.56),
HUIRecord(items=('Chips', 'Filet Mignon 1lb'), itemset_utility=25.979999999999997),
HUIRecord(items=('Coke 12oz', 'Filet Mignon 1lb'), itemset_utility=25.57),
HUIRecord(items=('Filet Mignon 1lb',), itemset_utility=22.99),
HUIRecord(items=('Chips',), itemset_utility=20.93)]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for itemset_mining-0.2.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c74e8d4f2666097b208f5edda33d83b1797fdc005a0b243885aaa4ac46c0cad8 |
|
MD5 | 68d3a3992ed2aec145e15d4c8dbcdf8a |
|
BLAKE2b-256 | 5c5bde33c0a8c118c3b06f086ac78b1e0a3cc316ae83389da70637b59cc22c01 |