Skip to main content

A python package for itemset mining algorithms.

Project description

Itemset Mining

downloads

latest release supported python versions package status license

travis build status docs build status coverage status

Implements itemset mining algorithms.

Algorithms

High-utility itemset mining (HUIM)

HUIM generalizes the problem of frequent itemset mining (FIM) by considering item values and weights. A popular application of HUIM is to discover all sets of items purchased together by customers that yield a high profit for the retailer. In such a case, item values would show not just that a load of bread is in a basket, but how many there are; and weights would include the profit from a loaf of bread.

More technically, HUIM requires transactions in the transactions "database" to have internal utilities (i.e. item values) associated with each item in each transaction and a "database" of external utilities for each item (i.e. weights).

Algorithm Class How to Cite
Two-Phase* itemset_mining.two_phase_huim.TwoPhase Link

* Includes max length support

Roadmap (high to low priority):

  • Address low-correlation HUIs with one of bond, all-confidence, or affinity. Itemsets that are high utility, but where the items aren't correlated can be misleading for making marketing decisions. E.g. if an itemset of a TV and a pen is a HUI, it's likely just because the TV is expensive, not because it's an interesting pattern.
  • Add average utility measure support, for easier, more intuitive minutil
  • Support discount strategies via a discount strategy table and upgraded external utilities table.
  • Add top-k HUI support.
  • Support identifying periodic high utility itemsets. This allows detection of purchase patterns among high-utility itemsets to allow cross-promotions to customers who buy sets of items periodically.
  • Support items' on-shelf time. Ignmoring on-shelf time will biat toward items that have more shelf time, since they have more chance to generate higher utility.
  • Allow incremental transaction updates without rerunning everything.
  • Support concise HUI itemsets, specifically closed form. This allows the algorithm to be more efficient, only showing longer itemsets, which may be the most interesting ones (correlation issues aside).

Installation:

pip install itemset-mining

Example:

    >>> from operator import attrgetter
    >>> from itemset_mining.two_phase_huim import TwoPhase
    >>> transactions = [
    ...     [("Coke 12oz", 6), ("Chips", 2), ("Dip", 1)],
    ...     [("Coke 12oz", 1)],
    ...     [("Coke 12oz", 2), ("Chips", 1), ("Filet Mignon 1lb", 1)],
    ...     [("Chips", 1)],
    ...     [("Chips", 2)],
    ...     [("Coke 12oz", 6), ("Chips", 1)]
    ... ]
    >>> # ARP for each item
    >>> external_utilities = {
    ...     "Coke 12oz": 1.29,
    ...     "Chips": 2.99,
    ...     "Dip": 3.49,
    ...     "Filet Mignon 1lb": 22.99
    ... }
    >>> # Minimum dollar value generated by an itemset we care about across all transactions
    >>> minutil = 20.00

    >>> hui = TwoPhase(transactions, external_utilities, minutil)
    >>> result = hui.get_hui()
    >>> sorted(result, key=attrgetter('itemset_utility'), reverse=True)
    ... # doctest: +NORMALIZE_WHITESPACE
    [HUIRecord(items=('Chips', 'Coke 12oz'), itemset_utility=30.02),
     HUIRecord(items=('Chips', 'Coke 12oz', 'Filet Mignon 1lb'), itemset_utility=28.56),
     HUIRecord(items=('Chips', 'Filet Mignon 1lb'), itemset_utility=25.979999999999997),
     HUIRecord(items=('Coke 12oz', 'Filet Mignon 1lb'), itemset_utility=25.57),
     HUIRecord(items=('Filet Mignon 1lb',), itemset_utility=22.99),
     HUIRecord(items=('Chips',), itemset_utility=20.93)]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itemset_mining-0.2.2.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

itemset_mining-0.2.2-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file itemset_mining-0.2.2.tar.gz.

File metadata

  • Download URL: itemset_mining-0.2.2.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for itemset_mining-0.2.2.tar.gz
Algorithm Hash digest
SHA256 7b0c5830ad89bea7f368473a77a0b403f3fb98e202f5b87caf6049fc89c82b69
MD5 d6ebac5b6b4a01460d6640be7e6ee2e6
BLAKE2b-256 91a43f9cc0dad26117869fbbd7a672e79f2b902f1326ebf34cfed6f0b3449762

See more details on using hashes here.

File details

Details for the file itemset_mining-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for itemset_mining-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c74e8d4f2666097b208f5edda33d83b1797fdc005a0b243885aaa4ac46c0cad8
MD5 68d3a3992ed2aec145e15d4c8dbcdf8a
BLAKE2b-256 5c5bde33c0a8c118c3b06f086ac78b1e0a3cc316ae83389da70637b59cc22c01

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page