Pattern Structures miner in Python. An add-on for Caspailleur package. Part of SmartFCA project
Project description
paspailleur
An add-on for caspailleur to work with Pattern Structures
A Pattern Structure (D, ⊑)
represents a description space D
where every two descriptions can be compared by a "less precise" operator ⊑
.
For example, if D
is a set of ngrams then ngram (hello,)
is less precise then (hello, world)
: (hello, ) ⊑ (hello, world)
,
that is every ngram that contains (hello, world)
contains (hello,)
.
[!WARNING] The package is in active development stage. Things can change often.
Implemented Pattern Structures
from paspailleur import pattern_structures as PS
General use
IntervalPS
Every description is a closed interval of real numbers [a,b]
.
Description [a,b]
is less precise than description [c,d]
if a<=c, d<=b
.
For example, description [1.5, 3.14]
is less precise than [2, 3]
, i.e. [1.5, 3.14] ⊑ [2, 3]
(yes the notation is counterintuitive here).
d1, d2 = (1.5, 3.14), (2, 3)
ps = PS.IntervalPS()
assert ps.is_less_precise(d1, d2)
SubSetPS
Every description is a set of values.
Description A
is less precise than description B
if A
is a subset of B
: A ⊆ B
.
For example description {green, cubic}
is less precise than {green, cubic, heavy}
.
d1, d2 = {'green', 'cubic'}, {'green', 'cubic', 'heavy'}
ps = PS.SubSetPS()
assert ps.is_less_precise(d1, d2)
SuperSetPS
Every description is a set of values.
Description A
is less precise than description B
if A
is a superset of B
: A ⊇ B
.
For example description {green, yellow, red}
is less precise than {green, yellow}
.
d1, d2 = {'green', 'yellow', 'red'}, {'green', 'yellow'}
ps = PS.SuperSetPS()
assert ps.is_less_precise(d1, d2)
CartesianPS
A pattern structure to combine various independent basic pattern structures in one.
# Combining three previous examples together
d1 = [(1.5, 3.14), {'green', 'cubic'}, {'green', 'yellow', 'red'}]
d2 = [(2, 3), {'green', 'cubic', 'heavy'}, {'green', 'yellow'}]
basic_structures = [PS.IntervalPS(), PS.SubSetPS(), PS.SuperSetPS()]
ps = PS.CartesianPS(basic_structures)
assert ps.is_less_precise(d1, d2)
NLP
NgramPS
Every description is a set of incomparable ngram, i.e. set of incomparable tuple of words.
Ngram A = (a_1, a_2, ..., a_n)
is less precise than ngram B = (b_1, b_2, ..., b_m)
if A
can be embedded into B
:
i.e. exists i = 1, ..., m-n
s.t. A = B[i:i+n]
.
For example (hello, world)
is less precise than (hello, world, !)
.
Description D_1 = {A_1, A_2, ...}
is less precise than description D_2 = {B_1, B_2, ...}
if every ngram in D1
is less precise than some ngram in D2
.
d1 = {('hello', 'world'), ('!',)}
d2 = {('hello', 'world', '!')}
ps = PS.NgramPS()
assert ps.is_less_precise(d1, d2)
SynonymPS
Every description is a set of words, representing the synonyms of words contained in some text.
Description A
is less precise than description B
if A
is a subset of B
: A ⊆ B
.
d1, d2 = 'hello', 'hello world'
ps = PS.SynonymPS()
pattern1, pattern2 = ps.preprocess_data([d1, d2])
assert ps.is_less_precise(pattern1, pattern2)
print('pattern1:', pattern1)
print('pattern2:', pattern2)
pattern1: {'hello'}
pattern2: {'hello', 'universe'}
AntonymPS
Every description is a set of words, representing the antonyms of words contained in some text.
Description A
is less precise than description B
if A
is a subset of B
: A ⊆ B
.
d1, d2 = 'good', 'good dog'
ps = PS.AntonymPS()
pattern1, pattern2 = ps.preprocess_data([d1, d2])
assert ps.is_less_precise(pattern1, pattern2)
print('pattern1:', pattern1)
print('pattern2:', pattern2)
pattern1: {'evil'}
pattern2: {'evil'}
So, the system does not know any antonym to "dog".
Tabular data
[!INFO] Coming soon
NumberPS
CategoryPS
Graphs
[!INFO] Coming soon
Coming soon
GraphPS
OrderedGraphPS
How to create a custom Pattern Structure
To be described
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file paspailleur-0.0.3.tar.gz
.
File metadata
- Download URL: paspailleur-0.0.3.tar.gz
- Upload date:
- Size: 48.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c66853c34c322022ea8f23fdf8424b9a4e16397c7208418a199e68c3a84b4eb |
|
MD5 | 7438524bdaa1a4eec1821aeaf2315735 |
|
BLAKE2b-256 | cd6a0ab579d3d04899dff77260e3f699aec3a6d967bef830498935dacd171acf |
File details
Details for the file paspailleur-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: paspailleur-0.0.3-py3-none-any.whl
- Upload date:
- Size: 39.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4859e37c50e304cdcfc535bbeb29dfbef3816cbc77fa040405b0e49a629ab409 |
|
MD5 | 46ee6f001d0a5d3c48bbc5b521205ff2 |
|
BLAKE2b-256 | 38786a208c3b537f27f14fa91b0720bcbe738992b308a7f3fcb5e84eea9ec592 |