Mines sequence patterns co-occuring with non-compliant windows in a target feature.
Project description
nwc-pattern-mining
Non-compliant Window Co-occurrence pattern mining in temporal data
- A Python library to find sequential association patterns in time-series data, co-occurring with target anomalous windows.
- Anomalous windows are sequences in which a target variable defies expected behavior (e.g. Emissions from a vehicle).
- The patterns are found in co-occurrence with a specific non-compliant feature (to decipher the reason behinds it's irregular behavior).
- The package uses various pruning methodologies to speed up pattern mining, and hashing for quick support count.
- The algorithm is based on research: (Discovering non-compliant window co-occurrence patterns)[https://link.springer.com/article/10.1007/s10707-016-0289-3]
API-description:
from nwc_pattern_miner import mine_sequence_patterns
##API Parameters:
- series_df:
pd.DataFrame
; Input DataFrame (Only features [discretized] and Target [binarized] columns) - nc_window_col:
str
; Column Name with Binary Target (Anomalous Windows) - support_threshold:
float
; Support threshold for sequence co-occurrence patterns - crossk_threshold:
float
; Ripley's Cross-k threshold for sequence co-occurrence patterns - pattern_length:
int
; length of feature sequences co-occurring with anomalous windows - confidence_threshold:
float, default=-1
; Confidence threshold for sequence co-occurrence patterns - lag:
int, default= 0
; lag consideration between sequence patterns and anomalous windows - invalid_seq_indexes:
list, default=list()
; list of indexes across which sequence patterns would be invalidated - output_metric:
{'crossk', 'support'}, default='crossk'
; Metric used to sort patterns mined - output_type:
{'topk', 'threshold'}, default='topk';
Type of output for sequence patterns mined - output_threshold:
float, default= -1
; Threshold cutoff used to get output sequence patterns, ifoutput_type='threshold'
- topk:
int, default=100
; Top-k sequence patterns obtained based onoutput_metric
, ifoutput_type='topk'
Sample Input DataFrame:
engrpm | EGRkgph | MSPhum | EngTq | NCWindow |
---|---|---|---|---|
9 | 11 | 5 | 3 | 1 |
3 | 1 | 5 | 4 | 0 |
Sample Output DataFrame:
engrpm | EGRkgph | MSPhum | EngTq | Count | Support | Kvalue | Confidence | First Occurrence Index |
---|---|---|---|---|---|---|---|---|
4 4 4 | 5 5 5 | 2 2 2 | 146 | 0.00528 | 2.377 | 1.0 | 47167 | |
4 4 4 | 7 7 7 | 250 | 0.00643 | 2.357 | 1.0 | 41984 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nwc_pattern_miner-0.1.tar.gz
(12.9 kB
view hashes)
Built Distribution
Close
Hashes for nwc_pattern_miner-0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b88d036751251bb60c882db27b2a7102c1b9179d4c77ea07c25638d236ac0705 |
|
MD5 | 6aa6ee68bf64a3cde1aadbe619090e4e |
|
BLAKE2b-256 | 46c283ecb0034a069919a915121901d1b573244e395469ccc3a27029842d3b66 |