Skip to main content

Pattern Structures miner in Python. An add-on for Caspailleur package. Part of SmartFCA project

Project description

PyPi Licence LORIA SmartFCA

paspailleur

An add-on for caspailleur to work with Pattern Structures

A Pattern Structure (D, ⊑) represents a description space D where every two descriptions can be compared by a "less precise" operator . For example, if D is a set of ngrams then ngram (hello,) is less precise then (hello, world): (hello, ) ⊑ (hello, world), that is every ngram that contains (hello, world) contains (hello,).

[!WARNING] The package is in active development stage. Things can change often.

Implemented Pattern Structures

from paspailleur import pattern_structures as PS

General use

IntervalPS

Every description is a closed interval of real numbers [a,b]. Description [a,b] is less precise than description [c,d] if a<=c, d<=b. For example, description [1.5, 3.14] is less precise than [2, 3], i.e. [1.5, 3.14] ⊑ [2, 3] (yes the notation is counterintuitive here).

d1, d2 = (1.5, 3.14), (2, 3)
ps = PS.IntervalPS()
assert ps.is_less_precise(d1, d2)

SubSetPS

Every description is a set of values. Description A is less precise than description B if A is a subset of B: A ⊆ B. For example description {green, cubic} is less precise than {green, cubic, heavy}.

d1, d2 = {'green', 'cubic'}, {'green', 'cubic', 'heavy'}
ps = PS.SubSetPS()
assert ps.is_less_precise(d1, d2)

SuperSetPS

Every description is a set of values. Description A is less precise than description B if A is a superset of B: A ⊇ B. For example description {green, yellow, red} is less precise than {green, yellow}.

d1, d2 = {'green', 'yellow', 'red'}, {'green', 'yellow'}
ps = PS.SuperSetPS()
assert ps.is_less_precise(d1, d2)

CartesianPS

A pattern structure to combine various independent basic pattern structures in one.

# Combining three previous examples together
d1 = [(1.5, 3.14), {'green', 'cubic'}, {'green', 'yellow', 'red'}]
d2 = [(2, 3), {'green', 'cubic', 'heavy'}, {'green', 'yellow'}]
basic_structures = [PS.IntervalPS(), PS.SubSetPS(), PS.SuperSetPS()]
ps = PS.CartesianPS(basic_structures)
assert ps.is_less_precise(d1, d2)

NLP

NgramPS

Every description is a set of incomparable ngram, i.e. set of incomparable tuple of words.

Ngram A = (a_1, a_2, ..., a_n) is less precise than ngram B = (b_1, b_2, ..., b_m) if A can be embedded into B: i.e. exists i = 1, ..., m-n s.t. A = B[i:i+n]. For example (hello, world) is less precise than (hello, world, !).

Description D_1 = {A_1, A_2, ...} is less precise than description D_2 = {B_1, B_2, ...} if every ngram in D1 is less precise than some ngram in D2.

d1 = {('hello', 'world'), ('!',)}
d2 = {('hello', 'world', '!')}
ps = PS.NgramPS()
assert ps.is_less_precise(d1, d2)

SynonymPS

Every description is a set of words, representing the synonyms of words contained in some text. Description A is less precise than description B if A is a subset of B: A ⊆ B.

d1, d2 = 'hello', 'hello world'
ps = PS.SynonymPS()
pattern1, pattern2 = ps.preprocess_data([d1, d2])
assert ps.is_less_precise(pattern1, pattern2)

print('pattern1:', pattern1)
print('pattern2:', pattern2)

pattern1: {'hello'}
pattern2: {'hello', 'universe'}

AntonymPS

Every description is a set of words, representing the antonyms of words contained in some text. Description A is less precise than description B if A is a subset of B: A ⊆ B.

d1, d2 = 'good', 'good dog'
ps = PS.AntonymPS()
pattern1, pattern2 = ps.preprocess_data([d1, d2])
assert ps.is_less_precise(pattern1, pattern2)

print('pattern1:', pattern1)
print('pattern2:', pattern2)

pattern1: {'evil'}
pattern2: {'evil'}

So, the system does not know any antonym to "dog".

Tabular data

[!INFO] Coming soon

NumberPS

CategoryPS

Graphs

[!INFO] Coming soon

Coming soon

GraphPS

OrderedGraphPS

How to create a custom Pattern Structure

To be described

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paspailleur-0.0.3.tar.gz (48.6 kB view hashes)

Uploaded Source

Built Distribution

paspailleur-0.0.3-py3-none-any.whl (39.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page