simex

Ultra-simple human readable DSL for matching text.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- Unix
Programming Language
Topic
- Software Development :: Libraries

Project description

SimEx
=====

SimEx is a tool that lets you write simple, readable equivalents of regular expressions that
compile down to regular expressions.

This is useful for:

* Improving the readability and maintainability of code that uses long regexes with a lot of escaped characters.
* Allowing non-developers to read and understand simple regex-equivalents and potentially even write their own.

Simex is *not* a full replacement for regular expressions and its use is not suitable everywhere a regex is used.

It is ideally used where you usually want to compare two strings but you occasionally need to compare two
strings with a pattern embedded within them.

It is an embodiment of `the rule of least power <https://en.wikipedia.org/wiki/Rule_of_least_power>`_.

To install::

$ pip install simex

Example
-------

.. code-block:: python

>>> from simex import Simex
>>> simex = Simex({"url": r".*?", "anything": r".*?"})
>>> regex = simex.compile("""<a href="{{ url }}">{{ anything }}</a>""")
>>> regex.match("""<a href="http://www.cnn.com">CNN</a>""") is not None
True

Do I have to define all of the sub-regular expressions myself?
--------------------------------------------------------------

No. SimEx also contains a built in library of commonly used regular expressions.

This will also work:

.. code-block:: python

>>> from simex import Simex
>>> my_simex = DefaultSimex()
>>> regex = my_simex.compile("""<a href="{{ url }}">{{ anything }}</a>""")
>>> regex
re.compile(r'\<a\ href\=\"(ht|f)tp(s?)\:\/\/[0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*(:(0-9)*)*(\/?)([a-zA-Z0-9\-\.\?\,\\'\/\\\+&%\$#_]*)?\"\>.*?\<\/a\>', re.UNICODE)

>>> regex.match("""<a href="http://www.cnn.com">CNN</a>""") is not None

All regexes in the existing library can be overridden, and more can be added, e.g.

.. code-block:: python

>>> simex = DefaultSimex({"url": r".*?", "mycode": r"[A-Z][0-9][0-9][0-9]"})

Currently there are five in the list of pre-defined regexes:

* URL
* Email
* Integer
* Number
* Anything

Pull requests with commonly required non-controversial regexes are welcome.

Using {{ and }} creates conflicts for me! Why not [[[ and ]]]?
--------------------------------------------------------------

{{ and }} have a special meaning in some languages which you may want to use
with simex - e.g. jinja2.

In order to prevent confusion in such circumstances, you can define your
own delimeters:

.. code-block:: python

>>> from simex import Simex
>>> simex = Simex(open_delimeter="[[[", close_delimeter="]]]")
>>> simex.compile("""<a href="[[[ url ">[[[ anything ]]]</a>""")
>>> simex.match("""<a href="http://www.cnn.com">CNN</a>""") is not None

Matching exact strings
----------------------

By default a simex will not match an exact string. i.e. it will produce:

.. code-block:: python

>>> from simex import Simex
>>> simex = Simex({"url": r".*?", "anything": r".*?"})
>>> regex = simex.compile("""<a href="{{ url }}">{{ anything }}</a>""")
>>> regex
re.compile(r'\<a\ href\=\".*?\"\>.*?\<\/a\>', re.UNICODE)
>>> regex.match("""<a href="http://www.cnn.com">CNN</a> THERE IS MORE TEXT""") is not None
True

However, if you want, simexes can be used to do exact matching. For example:

.. code-block:: python

>>> from simex import Simex
>>> simex = Simex({"url": r".*?", "anything": r".*?"}, exact=True)
>>> regex = simex.compile("""<a href="{{ url }}">{{ anything }}</a>""")
>>> regex
re.compile(r'^\<a\ href\=\".*?\"\>.*?\<\/a\>$', re.UNICODE)
>>> regex.match("""<a href="http://www.cnn.com">CNN</a>""") is not None
True
>>> regex.match("""<a href="http://www.cnn.com">CNN</a> THERE IS MORE TEXT""") is not None
False

Matching can also treat whitespace (tabs, spaces and newlines) as interchangeable. For example:

.. code-block:: python

>>> from simex import Simex
>>> simex = Simex({"url": r".*?", "anything": r".*?"}, flexible_whitespace=True)
>>> regex = simex.compile("""<a href="{{ url }}">{{ anything }}</a>""")
>>> regex
re.compile(r'\<a\\s+href\=\".*?\"\>.*?\<\/a\>', re.UNICODE)
>>> regex.match("""<a href="http://www.cnn.com">CNN</a>""") is not None
True

.. code-block:: python

How does it work?
-----------------

The regular expression simply escapes an entire simexpression, except for the
components surrounded by {{ and }}, which it replaces with defined regular
expressions - like "email" or "anything" or "number" defined in the dict.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- Unix
Programming Language
Topic
- Software Development :: Libraries

Release history Release notifications | RSS feed

This version

0.3.5

Feb 5, 2017

0.3.4

Dec 22, 2016

0.3.3

Dec 17, 2016

0.3.2

Dec 4, 2016

0.3.1

Nov 20, 2016

0.3.0

Oct 30, 2016

0.2.0

Aug 9, 2016

0.1.1

Aug 9, 2016

0.1

Aug 9, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simex-0.3.5.tar.gz (4.4 kB view details)

Uploaded Feb 5, 2017 Source

File details

Details for the file simex-0.3.5.tar.gz.

File metadata

Download URL: simex-0.3.5.tar.gz
Upload date: Feb 5, 2017
Size: 4.4 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for simex-0.3.5.tar.gz
Algorithm	Hash digest
SHA256	`78b8fa89edbc6375085715a89365475aa294e3499070243c74c8b515e7c33608`
MD5	`1921f2f4f5c4f6aaa5a3c57967cd280c`
BLAKE2b-256	`021fd741eed51732130178e503d3a7978a930ec68f86aee47d3f54627853a952`

See more details on using hashes here.

simex 0.3.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes