Skip to main content

Pattern Structures miner in Python. An add-on for Caspailleur package. Part of SmartFCA project

Project description

PyPi Licence LORIA SmartFCA

paspailleur

An add-on for caspailleur to work with Pattern Structures

A Pattern Structure (D, ⊑) represents a description space D where every two descriptions can be compared by a "less precise" operator . For example, if D is a set of ngrams then ngram (hello,) is less precise then (hello, world): (hello, ) ⊑ (hello, world), that is every ngram that contains (hello, world) contains (hello,).

[!WARNING] The package is in active development stage. Things can change often.

Implemented Pattern Structures

from paspailleur import pattern_structures as PS

General use

IntervalPS

Every description is a closed interval of real numbers [a,b]. Description [a,b] is less precise than description [c,d] if a<=c, d<=b. For example, description [1.5, 3.14] is less precise than [2, 3], i.e. [1.5, 3.14] ⊑ [2, 3] (yes the notation is counterintuitive here).

d1, d2 = (1.5, 3.14), (2, 3)
ps = PS.IntervalPS()
assert ps.is_less_precise(d1, d2)

SubSetPS

Every description is a set of values. Description A is less precise than description B if A is a subset of B: A ⊆ B. For example description {green, cubic} is less precise than {green, cubic, heavy}.

d1, d2 = {'green', 'cubic'}, {'green', 'cubic', 'heavy'}
ps = PS.SubSetPS()
assert ps.is_less_precise(d1, d2)

SuperSetPS

Every description is a set of values. Description A is less precise than description B if A is a superset of B: A ⊇ B. For example description {green, yellow, red} is less precise than {green, yellow}.

d1, d2 = {'green', 'yellow', 'red'}, {'green', 'yellow'}
ps = PS.SuperSetPS()
assert ps.is_less_precise(d1, d2)

CartesianPS

A pattern structure to combine various independent basic pattern structures in one.

# Combining three previous examples together
d1 = [(1.5, 3.14), {'green', 'cubic'}, {'green', 'yellow', 'red'}]
d2 = [(2, 3), {'green', 'cubic', 'heavy'}, {'green', 'yellow'}]
basic_structures = [PS.IntervalPS(), PS.SubSetPS(), PS.SuperSetPS()]
ps = PS.CartesianPS(basic_structures)
assert ps.is_less_precise(d1, d2)

NLP

NgramPS

Every description is a set of incomparable ngram, i.e. set of incomparable tuple of words.

Ngram A = (a_1, a_2, ..., a_n) is less precise than ngram B = (b_1, b_2, ..., b_m) if A can be embedded into B: i.e. exists i = 1, ..., m-n s.t. A = B[i:i+n]. For example (hello, world) is less precise than (hello, world, !).

Description D_1 = {A_1, A_2, ...} is less precise than description D_2 = {B_1, B_2, ...} if every ngram in D1 is less precise than some ngram in D2.

d1 = {('hello', 'world'), ('!',)}
d2 = {('hello', 'world', '!')}
ps = PS.NgramPS()
assert ps.is_less_precise(d1, d2)

SynonymPS

Every description is a set of words, representing the synonyms of words contained in some text. Description A is less precise than description B if A is a subset of B: A ⊆ B.

d1, d2 = 'hello', 'hello world'
ps = PS.SynonymPS()
pattern1, pattern2 = ps.preprocess_data([d1, d2])
assert ps.is_less_precise(pattern1, pattern2)

print('pattern1:', pattern1)
print('pattern2:', pattern2)

pattern1: {'hello'}
pattern2: {'hello', 'universe'}

AntonymPS

Every description is a set of words, representing the antonyms of words contained in some text. Description A is less precise than description B if A is a subset of B: A ⊆ B.

d1, d2 = 'good', 'good dog'
ps = PS.AntonymPS()
pattern1, pattern2 = ps.preprocess_data([d1, d2])
assert ps.is_less_precise(pattern1, pattern2)

print('pattern1:', pattern1)
print('pattern2:', pattern2)

pattern1: {'evil'}
pattern2: {'evil'}

So, the system does not know any antonym to "dog".

Tabular data

[!INFO] Coming soon

NumberPS

CategoryPS

Graphs

[!INFO] Coming soon

Coming soon

GraphPS

OrderedGraphPS

How to create a custom Pattern Structure

To be described

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paspailleur-0.0.3.tar.gz (48.6 kB view details)

Uploaded Source

Built Distribution

paspailleur-0.0.3-py3-none-any.whl (39.5 kB view details)

Uploaded Python 3

File details

Details for the file paspailleur-0.0.3.tar.gz.

File metadata

  • Download URL: paspailleur-0.0.3.tar.gz
  • Upload date:
  • Size: 48.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for paspailleur-0.0.3.tar.gz
Algorithm Hash digest
SHA256 5c66853c34c322022ea8f23fdf8424b9a4e16397c7208418a199e68c3a84b4eb
MD5 7438524bdaa1a4eec1821aeaf2315735
BLAKE2b-256 cd6a0ab579d3d04899dff77260e3f699aec3a6d967bef830498935dacd171acf

See more details on using hashes here.

File details

Details for the file paspailleur-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: paspailleur-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 39.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for paspailleur-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4859e37c50e304cdcfc535bbeb29dfbef3816cbc77fa040405b0e49a629ab409
MD5 46ee6f001d0a5d3c48bbc5b521205ff2
BLAKE2b-256 38786a208c3b537f27f14fa91b0720bcbe738992b308a7f3fcb5e84eea9ec592

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page