Skip to main content

文本多模匹配AC自动机的Python实现

Project description

AhoCorasickAutomation

AC自动机,用于多模匹配,在使用前需要先根据词典创建Trie树,构建对象稍微耗时,搜索时间平均性能O(N),但也不建议过长的输入 注意,AC自动机不能避免分词错误,如“佳保安全”,若“保安”是关键词,也会将其识别出,使用前请确认实际的需求场景

ac_auto_entity = AhoCorasickAutomation(["关键词1", "关键词2"])
ac_auto_entity.search("需要搜索的文本,其中可能包含关键词1")

AhoCorasickAutomationConditionalFilter

对AC自动机的匹配结果进行条件过滤,可设置前后一定距离内的文本需要包含或不包含某些关键词的条件

ac_auto_entity = AhoCorasickAutomation(["关键词1", "关键词2"])
text_to_scan = "需要搜索的文本,其中可能包含关键词1,要求附近有条件1"
hits = ac_auto_entity.search(text_to_scan)
ac_filter_entity = AhoCorasickAutomationConditionalFilter({
    "关键词1": ["条件1"]
}, distance=10, mode=AhoCorasickAutomationConditionalFilter.FILTER_MODE_WHITE)
hits = ac_filter_entity.filter(text_to_scan, hits)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ac_auto-0.1.0.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

ac_auto-0.1.0-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file ac_auto-0.1.0.tar.gz.

File metadata

  • Download URL: ac_auto-0.1.0.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.5

File hashes

Hashes for ac_auto-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6147cc126ed611a855e560a83bccaa94949499aced7e6c51205a0a1c2424afe5
MD5 79bb03a1fa7ef641520ae0c303abb98c
BLAKE2b-256 9e703528a110bb281e73ed9cce482826c4a73dbc3222c61359adeaca30951bef

See more details on using hashes here.

File details

Details for the file ac_auto-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ac_auto-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.5

File hashes

Hashes for ac_auto-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8c8ed7a765c97f5ee30d845473d59727acb3c4fe5f2de260fee1e6d922c281ff
MD5 383fdb31304670295a7e8699e2b62c48
BLAKE2b-256 637d2815c841ced9165f0c612dd30a9bee40efdc6786b52c0debe16ab2c52488

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page