Skip to main content

PatternCounter

Project description

PatternCounter

PyPI Status Python Version License

Read the documentation at https://patterncounter.readthedocs.io/ Tests Codecov

pre-commit Black

Features

This tool allows to count patterns in lists of sequential groups using rules and variables.

For the following examples, consider the following file (sequences.txt):

A -1 -2
B -1 -2
A B -1 -2
A -1 B C -1 -2
B -1 A B -1 A -1 C -1 -2

Example 1: Count how many sequences contain both the elements A and B:

$ patterncounter count "A B" -n -f sequences.txt
Supp((A B)) = 0.6 | 3 lines: 2, 3, 4

Example 2: Count how many sequences contain elements A and B at the same group:

$ patterncounter count "A & B" -n -f sequences.txt
Supp(A & B) = 0.4 | 2 lines: 2, 4

Example 3: Count how many sequences have an element B that after after A:

$ patterncounter count "A -> B" -n -f sequences.txt
Supp(A -> B) = 0.2 | 1 lines: 3

Example 4: Count in how many sequences the element B was removed within an interval of A:

$ patterncounter count "[A OutB]" -n -f sequences.txt
Supp([A OutB]) = 0.2 | 1 lines: 4

Example 5: Count in how many sequences there is an element C that occurs after an interval of A:

$ patterncounter count "[A] -> C" -n -f sequences.txt
Supp([A] -> C) = 0.4 | 2 lines: 3, 4

Example 6: Show results even when the pattern does not exist:

$ patterncounter count "Z" -n -f sequences.txt -z
Supp(Z) = 0.0 | 0 lines

In addition to using simple rules, it is possible to define multiple rules and calculated association rules metrics among them:

Example 7: Count both how many sequences have an interval of A, and how many sequences have an interval of A with an element B inside:

$ patterncounter count "[A]" "[A B]" -n -f sequences.txt
Supp([A], [A B]) = 0.4 | 2 lines: 2, 4
Association Rule: [A] ==> [A B]
  Supp([A]) = 0.8 | 4 lines: 0, 2, 3, 4
  Supp([A B]) = 0.4 | 2 lines: 2, 4
  Conf = 0.5
  Lift = 1.25
Association Rule: [A B] ==> [A]
  Supp([A B]) = 0.4 | 2 lines: 2, 4
  Supp([A]) = 0.8 | 4 lines: 0, 2, 3, 4
  Conf = 1.0
  Lift = 1.25

It is also possible to define variables.

Example 8: Count how many sequences have groups with two distinct elements:

$ patterncounter count "x & y" -v "x" -v "y" -n -f sequences.txt -z
Supp(x & y) = 0.6 | 3 lines: 2, 3, 4

[BINDING: x = B; y = A]
  Supp(B & A) = 0.4 | 2 lines: 2, 4

[BINDING: x = A; y = B]
  Supp(A & B) = 0.4 | 2 lines: 2, 4

[BINDING: x = B; y = C]
  Supp(B & C) = 0.2 | 1 lines: 3

[BINDING: x = A; y = C]
  Supp(A & C) = 0.0 | 0 lines

[BINDING: x = C; y = B]
  Supp(C & B) = 0.2 | 1 lines: 3

[BINDING: x = C; y = A]
  Supp(C & A) = 0.0 | 0 lines

Note that the result first shows the combined metrics (union).

Finally, given a file of sequences, it is also possible to select its lines (0-indexes):

$ patterncounter select -f sequences.txt -n 4
0| A -1 -2
2| A B -1 -2
4| B -1 A B -1 A -1 C -1 -2

Installation

You can install PatternCounter via pip from PyPI:

$ pip install patterncounter

Usage

Please see the Command-line Reference for details.

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, PatternCounter is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from @cjolowicz's Hypermodern Python Cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

patterncounter-0.2.0.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

patterncounter-0.2.0-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file patterncounter-0.2.0.tar.gz.

File metadata

  • Download URL: patterncounter-0.2.0.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.14

File hashes

Hashes for patterncounter-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5add87a0457938f52b4d3301adf9321328d98196dbbd5d586df62ae1088f7cc9
MD5 0de491568d8c9e9cc93354fbe1f4d975
BLAKE2b-256 f7653ada58ad50b5b3d791f40b03665b44fdb18ebad024fcd573495a089684ff

See more details on using hashes here.

File details

Details for the file patterncounter-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for patterncounter-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c1b20776e47f57af074ac5d57642b8b1ecab3bda31877dc7870b01895098cb32
MD5 18ffa4346edf55f59c6b45ef8dda59e1
BLAKE2b-256 f184ec95231b6385aa32392284a9ebca800262cf70ce9600db329b1ce70193ee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page