文本多模匹配AC自动机的Python实现
Project description
AhoCorasickAutomation
AC自动机,用于多模匹配,在使用前需要先根据词典创建Trie树,构建对象稍微耗时,搜索时间平均性能O(N),但也不建议过长的输入 注意,AC自动机不能避免分词错误,如“佳保安全”,若“保安”是关键词,也会将其识别出,使用前请确认实际的需求场景
ac_auto_entity = AhoCorasickAutomation(["关键词1", "关键词2"]) ac_auto_entity.search("需要搜索的文本,其中可能包含关键词1")
AhoCorasickAutomationConditionalFilter
对AC自动机的匹配结果进行条件过滤,可设置前后一定距离内的文本需要包含或不包含某些关键词的条件
ac_auto_entity = AhoCorasickAutomation(["关键词1", "关键词2"]) text_to_scan = "需要搜索的文本,其中可能包含关键词1,要求附近有条件1" hits = ac_auto_entity.search(text_to_scan) ac_filter_entity = AhoCorasickAutomationConditionalFilter({ "关键词1": ["条件1"] }, distance=10, mode=AhoCorasickAutomationConditionalFilter.FILTER_MODE_WHITE) hits = ac_filter_entity.filter(text_to_scan, hits)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ac_auto-0.1.0.tar.gz
(5.0 kB
view details)
Built Distribution
File details
Details for the file ac_auto-0.1.0.tar.gz
.
File metadata
- Download URL: ac_auto-0.1.0.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6147cc126ed611a855e560a83bccaa94949499aced7e6c51205a0a1c2424afe5 |
|
MD5 | 79bb03a1fa7ef641520ae0c303abb98c |
|
BLAKE2b-256 | 9e703528a110bb281e73ed9cce482826c4a73dbc3222c61359adeaca30951bef |
File details
Details for the file ac_auto-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: ac_auto-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c8ed7a765c97f5ee30d845473d59727acb3c4fe5f2de260fee1e6d922c281ff |
|
MD5 | 383fdb31304670295a7e8699e2b62c48 |
|
BLAKE2b-256 | 637d2815c841ced9165f0c612dd30a9bee40efdc6786b52c0debe16ab2c52488 |