This tool clusterizes lines of text given a collection of input patterns modeled using regular expressions.
Project description
Pattern clustering
This tool clusterizes lines of text given a collection of input patterns modeled using regular expressions.
This work has been published to:
[ICPR’2022] A novel pattern-based edit distance for automatic log parsing, Maxime Raynal, Marc-Olivier Buob, Georges Quénot.
Features
Forms groups of homogeneous line using a pattern based distance, based on customizable patterns.
Configured by default to use common patterns (IP addresses, numeric values, etc.)
License
This project is licensed under the BSD-3-Clause license - see the LICENSE.
More about pattern-clustering
For more information, feel free to visit the wiki:
Acks
The skeleton package was created with Cookiecutter and the francois-durand/package_helper_2 project template.
The sphinx part is inspired from Sphinx-Autosummary-Recursion.
History
0.1.0 (2022-05-11): First release
First release on PyPI.
0.2.0 (2022-06-02): CI
Updated tox.ini and GitHub actions, work in progress.
0.3.0 (2022-06-22): Bug fixes and CI improvements
Fixed sphinx local build
Fixed bumpversion
Add experiments notebooks and datasets
Improved test suite
0.3.1 (2022-06-22): Bug fixes and CI improvements
Fixed readthedoc build
0.4.1 (2022-06-24): Bug fixes and CI improvements
Fixed readthedoc build
Implemented console script (cli)
Reworked PatternClusteringEnv class
Bug fixes
Updated documentation
0.4.2 (2022-06-24): Added entry points
Added pattern-distance entry point, see pattern-distance –help.
Added pattern-clustering-mkconf entry point. The resulting json may be passed to pattern-distance and pattern-clustering commands.
0.5.0 (2022-06-25): Added entry points
Bug fixes in notebooks/
Removed unused patterns
1.0.0 (2022-07-01): checked experiments
Checked experiments in notebooks/
Fixed warning related to documentation build
Improved tests
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pattern_clustering-1.0.0.tar.gz
.
File metadata
- Download URL: pattern_clustering-1.0.0.tar.gz
- Upload date:
- Size: 36.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 067c723dc2e5822c9a80a2daf5437e252ba195f71cc2f676ace0c980c6ef2cc9 |
|
MD5 | 497bf9af23c473913f8410cba754b47f |
|
BLAKE2b-256 | 662cbde5a633d4b46a011ca2b37f794b1dd50af06751c956d57d4644dd219684 |