PatternCounter
Project description
PatternCounter
Features
This tool allows to count patterns in lists of sequential groups using rules and variables.
For the following examples, consider the following file (sequences.txt
):
A -1 -2
B -1 -2
A B -1 -2
A -1 B C -1 -2
B -1 A B -1 A -1 C -1 -2
Example 1: Count how many sequences contain both the elements A and B:
$ patterncounter count "A B" -n -f sequences.txt
Supp((A B)) = 0.6 | 3 lines: 2, 3, 4
Example 2: Count how many sequences contain elements A and B at the same group:
$ patterncounter count "A & B" -n -f sequences.txt
Supp(A & B) = 0.4 | 2 lines: 2, 4
Example 3: Count how many sequences have an element B that after after A:
$ patterncounter count "A -> B" -n -f sequences.txt
Supp(A -> B) = 0.2 | 1 lines: 3
Example 4: Count in how many sequences the element B was removed within an interval of A:
$ patterncounter count "[A OutB]" -n -f sequences.txt
Supp([A OutB]) = 0.2 | 1 lines: 4
Example 5: Count in how many sequences there is an element C that occurs after an interval of A:
$ patterncounter count "[A] -> C" -n -f sequences.txt
Supp([A] -> C) = 0.4 | 2 lines: 3, 4
Example 6: Show results even when the pattern does not exist:
$ patterncounter count "Z" -n -f sequences.txt -z
Supp(Z) = 0.0 | 0 lines
In addition to using simple rules, it is possible to define multiple rules and calculated association rules metrics among them:
Example 7: Count both how many sequences have an interval of A, and how many sequences have an interval of A with an element B inside:
$ patterncounter count "[A]" "[A B]" -n -f sequences.txt
Supp([A], [A B]) = 0.4 | 2 lines: 2, 4
Association Rule: [A] ==> [A B]
Supp([A]) = 0.8 | 4 lines: 0, 2, 3, 4
Supp([A B]) = 0.4 | 2 lines: 2, 4
Conf = 0.5
Lift = 1.25
Association Rule: [A B] ==> [A]
Supp([A B]) = 0.4 | 2 lines: 2, 4
Supp([A]) = 0.8 | 4 lines: 0, 2, 3, 4
Conf = 1.0
Lift = 1.25
It is also possible to define variables.
Example 8: Count how many sequences have groups with two distinct elements:
$ patterncounter count "x & y" -v "x" -v "y" -n -f sequences.txt -z
Supp(x & y) = 0.6 | 3 lines: 2, 3, 4
[BINDING: x = B; y = A]
Supp(B & A) = 0.4 | 2 lines: 2, 4
[BINDING: x = A; y = B]
Supp(A & B) = 0.4 | 2 lines: 2, 4
[BINDING: x = B; y = C]
Supp(B & C) = 0.2 | 1 lines: 3
[BINDING: x = A; y = C]
Supp(A & C) = 0.0 | 0 lines
[BINDING: x = C; y = B]
Supp(C & B) = 0.2 | 1 lines: 3
[BINDING: x = C; y = A]
Supp(C & A) = 0.0 | 0 lines
Note that the result first shows the combined metrics (union).
Finally, given a file of sequences, it is also possible to select its lines (0-indexes):
$ patterncounter select -f sequences.txt -n 4
0| A -1 -2
2| A B -1 -2
4| B -1 A B -1 A -1 C -1 -2
Installation
You can install PatternCounter via pip from PyPI:
$ pip install patterncounter
Usage
Please see the Command-line Reference for details.
Contributing
Contributions are very welcome. To learn more, see the Contributor Guide.
License
Distributed under the terms of the MIT license, PatternCounter is free and open source software.
Issues
If you encounter any problems, please file an issue along with a detailed description.
Credits
This project was generated from @cjolowicz's Hypermodern Python Cookiecutter template.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file patterncounter-0.2.0.tar.gz
.
File metadata
- Download URL: patterncounter-0.2.0.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5add87a0457938f52b4d3301adf9321328d98196dbbd5d586df62ae1088f7cc9 |
|
MD5 | 0de491568d8c9e9cc93354fbe1f4d975 |
|
BLAKE2b-256 | f7653ada58ad50b5b3d791f40b03665b44fdb18ebad024fcd573495a089684ff |
File details
Details for the file patterncounter-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: patterncounter-0.2.0-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1b20776e47f57af074ac5d57642b8b1ecab3bda31877dc7870b01895098cb32 |
|
MD5 | 18ffa4346edf55f59c6b45ef8dda59e1 |
|
BLAKE2b-256 | f184ec95231b6385aa32392284a9ebca800262cf70ce9600db329b1ce70193ee |