Skip to main content

ASK-Orex: Ordinary human-friendly Regular Expressions

Project description

ASK-Orex

ASK-Orex is a package designed to simplify regular expressions in Python. It provides a high-level interface for constructing and working with regular expressions, making it easier and more intuitive to use.

The package is heavily inspired by Richy Cotton's rebus package for the R language.

Installation

You can install Orex using pip:

pip install ask-orex

To use Orex, import the ask-orex package into your Python script:

import ask_orex as ox

Creating regular expressions

The easiest regular expression is just a literal string to be found.

s = 'Hello World'
pattern = ox.literal('Hello')
pattern.is_match(s)
True

Orex regular expressions are extended by simply using a +. A slightly more useful example is to find a hex colour

pattern = ox.literal('#') + ox.HEXDIGIT + ox.HEXDIGIT +\
 ox.HEXDIGIT + ox.HEXDIGIT + ox.HEXDIGIT + ox.HEXDIGIT

s_hex = 'This package is #a83232 hot'
pattern.is_match(s_hex)
True
pattern.findall(s_hex)
['#a83232']

On the other hand

s_nonhex = 'Just a twitter handle: #Red123'
pattern.is_match(s_nonhex)
False
pattern.findall(s_nonhex)
[]

Clearly, the pattern is somewhat burdensome. We can alternatively write it like this

pattern = ox.literal('#') + ox.repeat(ox.HEXDIGIT, 6)

or

pattern = ox.literal('#') + ox.n_or_more(ox.HEXDIGIT, min=6, max=6)

Subpatterns

Most of orex functions accept other regular expressions as characters.

Say we want to find a regex to parse emails. We can define first all email allowed characters

allowed_characters = ox.character_class(ox.ALNUM + ox.DASH + '_' + ox.DOT)

where ox.ALNUM are just all letters and numbers. Note that the _ is just used as a string. In Orex it is okay to use strings as part of the pattern, unless they are at the start, where we use ox.literal to show python that we mean pattern business. Whenever we want to use the . or _ explicitly, it is best to use ox.DOT or ox.DASH, since . and - can have a special meaning in regular expression.

Next we define

user_name = ox.one_or_more(allowed_characters)
domain_name = ox.one_or_more(allowed_characters)
extension = ox.n_or_more(ox.ALPHA,2,4)

email_pattern =  user_name + '@' + domain_name + ox.DOT + extension
email = 'captainspamalot@funnyspammail.com'

And indeed

email_pattern.is_match(email)
True

Capturing

If we want to extract the individual components of a match, we have to capture them first. Using the email example above, we can just change the code to

email_pattern =  ox.capture(user_name) + '@' + ox.capture(domain_name) \
                + ox.DOT + ox.capture(extension)
email_pattern.findall(email)

returns

[('captainspamalot', 'funnyspammail', 'com')]

Handy indeed! We can also name the individual components for even easier extraction

email_pattern =  ox.capture(user_name,'user') \
              + '@' + ox.capture(domain_name,'domain') \
              + ox.DOT + ox.capture(extension,'ext')

We can now use the group_dict method for even easier access

email_pattern.group_dict(email)
{'user': 'captainspamalot', 'domain': 'funnyspammail', 'ext': 'com'}

Capturing has another benefit, namely that we can use it to find repeated patterns with the ox.backreference function

tag_name = ox.one_or_more(allowed_characters)
content = ox.one_or_more(ox.ANY_CHAR)
tag_pattern = ox.literal('<') + ox.capture(tag_name, 'tag') + '>' \
          + ox.capture(content,'content') + '</' +ox.backreference(name='tag')+'>'

message = '<name>Sir Snakington</name>'
tag_pattern.group_dict(message)
{'tag': 'name', 'content': 'Sir Snakington'}

Back references can also be used to substitute patterns, though here the named backreference does not work. Instead, the backrefence uses the index of the reference in the pattern.

tag_pattern = ox.literal('<') + ox.capture(tag_name) + '>' + ox.capture(content) + '</' +ox.backreference(1)+'>'
replace_pattern = ox.literal('<') + ox.backreference(2) + '>' + ox.backreference(1) + ox.literal('<') + ox.backreference(2) + '>'
tag_pattern.sub(message, replace_pattern)
'<Sir Snakington>name<Sir Snakington>'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ask-orex-1.1.2.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

ask_orex-1.1.2-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file ask-orex-1.1.2.tar.gz.

File metadata

  • Download URL: ask-orex-1.1.2.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for ask-orex-1.1.2.tar.gz
Algorithm Hash digest
SHA256 7291f0f4046f4d5c546a3abff7410e8897ff71649d24d7e0ddba79c8f8b80649
MD5 bfbb31dbfd11b4276441ae1543a78423
BLAKE2b-256 51e0112e2e5176ca1dec1038cbc7bcb2b4741437f52d040629a553ef2081f632

See more details on using hashes here.

File details

Details for the file ask_orex-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: ask_orex-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for ask_orex-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3005e3fa7ea2813f55fc3b07fb3629c0cf476b371c056c9628ece5c7b9ff4427
MD5 38fae49ba86608dd49b0eb9eb7598685
BLAKE2b-256 8f8ec568ea5e01f569fe07c106d158ea19aa52b21f396d37a599914aecf3c316

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page