Skip to main content

py-aho-corasick

Project description

===============================
py-aho-corasick
===============================


.. image:: https://img.shields.io/pypi/v/py_aho_corasick.svg
:target: https://pypi.python.org/pypi/py_aho_corasick

.. image:: https://img.shields.io/travis/JanFan/py_aho_corasick.svg
:target: https://travis-ci.org/JanFan/py_aho_corasick

.. image:: https://readthedocs.org/projects/py-aho-corasick/badge/?version=latest
:target: https://py-aho-corasick.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status

.. image:: https://pyup.io/repos/github/JanFan/py_aho_corasick/shield.svg
:target: https://pyup.io/repos/github/JanFan/py_aho_corasick/
:alt: Updates


py-aho-corasick


* Free software: MIT license
* The prototype is inspired by and borrowed from `Carolyn Shen <http://carshen.github.io/data-structures/algorithms/2014/04/07/aho-corasick-implementation-in-python.html>`_

Features
--------

* Pure Python implementation
* Python2 && Python3 support
* Unicode && UTF-8 encoding support
* Pickle-able serialization

Usage
--------

Install::

pip install py_aho_corasick

Usage::

from py_aho_corasick import py_aho_corasick

# keywords only
A = py_aho_corasick.Automaton(['cash', 'shew', 'ew'])
text = "cashew"
for idx,k,v in A.get_keywords_found(text):
assert text[idx:idx+len(k)] == k

# keywords and values
kv = [('cash',1), ('shew',2), ('ew',3)]
A = py_aho_corasick.Automaton(kv)
text = "cashew"
for idx,k,v in A.get_keywords_found(text):
assert text[idx:idx+len(k)] == k
assert v == dict(kv)[k]


Performance
--------

Compared with `pyahocorasick (C extention) <https://github.com/WojciechMula/pyahocorasick>`_

You can run the testing script to get this::

# Requirements:
# pip install pyahocorasick
python cmp.py

* pyahocorasick: text of 1000000 length, 1000 keywords, building time 0.026426076889038086 and searching time cost 0.047805070877075195
* py_aho_corasick: text of 1000000 length, 1000 keywords, building time 0.47435593605041504 and searching time cost 4.24287486076355

Sorry about the poor performance :-(

Development
--------

Run tests::

# testing against py2 and py3
tox


TODO
--------

* Performance optimization


=======
History
=======

0.1.0 (2017-04-17)
------------------

* First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_aho_corasick-1.1.0.tar.gz (15.8 kB view details)

Uploaded Source

File details

Details for the file py_aho_corasick-1.1.0.tar.gz.

File metadata

File hashes

Hashes for py_aho_corasick-1.1.0.tar.gz
Algorithm Hash digest
SHA256 8ab37ad323fa42982012a98a529ee2465ea2e4b382508c0aff86b2aaa5c45ca6
MD5 5825aa7253d00ce7e877aef3322fbab5
BLAKE2b-256 44b05c78df4c57dd87e0a96ef363eacd6bf6dba60457e5c51b4b54176a89ca20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page