Skip to main content

Alternate regular expression module, to replace re.

Project description

Note

For testing and comparison with the current ‘re’ module the new implementation is in the form of a module called ‘regex’.

Flags

There are 2 kinds of flag: scoped and global. Scoped flags can apply to only part of a pattern and can be turned on or off; global flags apply to the entire pattern and can only be turned on.

The scoped flags are: IGNORECASE, MULTILINE, DOTALL, VERBOSE.

The global flags are: ASCII, LOCALE, REVERSE, UNICODE, ZEROWIDTH.

Notes on named capture groups

All capture groups have a group number, starting from 1.

Groups with the same group name will have the same group number, and groups with a different group name will have a different group number.

The same group name can be used on different branches of an alternation because they are mutually exclusive, eg. (?<foo>first)|(?<foo>second). They will, of course, have the same group number.

Group numbers will be reused, where possible, across different branches of a branch reset, eg. (?|(first)|(second)) has only group 1. If capture groups have different group names then they will, of course, have different group numbers, eg. (?|(?<foo>first)|(?<bar>second)) has group 1 (“foo”) and group 2 (“bar”).

Additional features

  • Atomic grouping (issue #433030)

    (?>...)

    If the following pattern subsequently fails, then the subpattern as a whole will fail.

  • Possessive quantifiers.

    (?:...)?+ ; (?:...)*+ ; (?:...)++ ; (?:...){min,max}+

    The subpattern is matched up to ‘max’ times. If the following pattern subsequently fails, then all of the repeated subpatterns will fail as a whole. For example, (?:...)++ is equivalent to (?>(?:...)+).

  • Scoped flags (issue #433028)

    (?flags-flags:...)

    The flags will apply only to the subpattern. Flags can be turned on or off.

  • Inline flags (#433024, #433027)

    (?flags-flags)

    The flags will apply to the end of the group or pattern. Flags can be turned on or off.

  • Repeated repeats (#2537)

    A regex like ((x|y+)*)* will be accepted and will work correctly, but should complete more quickly.

  • Definition of ‘word’ character (#1693050)

    The definition of a ‘word’ character has been expanded for Unicode. This applies to \w, \W, \b and \B.

  • Groups in lookahead and lookbehind (#814253)

    Groups and group references are permitted in both lookahead and lookbehind.

  • Variable-length lookbehind

    A lookbehind can match a variable-length string.

  • Correct handling of charset with ignore case flag (#3511)

    Ranges within charsets are handled correctly when the ignore-case flag is turned on.

  • Unmatched group in replacement (#1519638)

    An unmatched group is treated as an empty string in a replacement template.

  • ‘Pathological’ patterns (#1566086, #1662581, #1448325, #1721518, #1297193)

    ‘Pathological’ patterns should complete more quickly.

  • Flags argument for regex.split, regex.sub and regex.subn (#3482)

    regex.split, regex.sub and regex.subn support a ‘flags’ argument.

  • ‘Overlapped’ argument for regex.findall and regex.finditer

    regex.findall and regex.finditer support an ‘overlapped’ flag which permits overlapped matches.

  • Unicode escapes (#3665)

    The Unicode escapes \uxxxx and \Uxxxxxxxx are supported.

  • Large patterns (#1160)

    Patterns can be much larger.

  • Zero-width match with regex.finditer (#1647489)

    regex.finditer behaves correctly when it splits at a zero-width match.

  • Zero-width split with regex.split (#3262)

    regex.split can split at a zero-width match if the zero-width flag is turned on. When the flag is turned off the current behaviour is unchanged because the BDFL thinks that some existing software might depend on it.

  • Splititer

    regex.splititer has been added. It’s a generator equivalent of regex.split.

  • Subscripting for groups

    A match object accepts access to the captured groups via subscripting and slicing:

    >>> m = regex.search(r"(?<before>.*?)(?<num>\\d+)(?<after>.*)", "pqr123stu")
    >>> print m["before"]
    pqr
    >>> print m["num"]
    123
    >>> print m["after"]
    stu
    >>> print len(m)
    4
    >>> print m[:]
    ('pqr123stu', 'pqr', '123', 'stu')
    
  • Named groups

    Named groups can be named with (?<name>...) as well as the current (?P<name>...).

  • Group references

    Groups can be referenced within a pattern with \g<name>. This also allows there to be more than 99 groups.

  • Named characters

    \N{name}

    Named characters are supported.

  • Unicode codepoint properties, blocks and scripts

    \p{name} ; \P{name}

    Unicode properties, blocks and scripts are supported. \p{name} matches a character which has property ‘name’ and \P{name} matches a character which doesn’t have property ‘name’.

    In order to avoid ambiguity, block names should start with In and script names should start with Is. If a name lacks such a prefix and it could be a block or a script, script will take priority, for example:

    1. InBasicLatin or BasicLatin, the ‘BasicLatin’ block.
    2. IsLatin or Latin, the ‘Latin’ script.
    3. InCyrillic, the ‘Cyrillic’ block.
    4. IsCyrillic or Cyrillic, the ‘Cyrillic’ script.
  • Posix character classes

    [[:alpha:]]

    Posix character classes are supported.

  • Search anchor

    \G

    A search anchor has been added. It matches at the position where each search started/continued and can be used for contiguous matches or in negative variable-length lookbehinds to limit how far back the lookbehind goes:

    >>> regex.findall(r"\w{2}", "abcd ef")
    ['ab', 'cd', 'ef']
    >>> regex.findall(r"\G\w{2}", "abcd ef")
    ['ab', 'cd']
    
    1. The search starts at position 0 and matches 2 letters ‘ab’.
    2. The search continues at position 2 and matches 2 letters ‘cd’.
    3. The search continues at position 4 and fails to match any letters.
    4. The anchor stops the search start position from being advanced, so there are no more results.
  • Reverse searching

    Searches can now work backwards:

    >>> regex.findall(r".", "abc")
    ['a', 'b', 'c']
    >>> regex.findall(r"(?r).", "abc")
    ['c', 'b', 'a']
    

    Note: the result of a reverse search is not necessarily the reverse of a forward search:

    >>> regex.findall(r"..", "abcde")
    ['ab', 'cd']
    >>> regex.findall(r"(?r)..", "abcde")
    ['de', 'bc']
    
  • Multithreading

    The regex module now releases the GIL when matching, enabling other Python threads to run concurrently.

  • Matching a single grapheme

    \X

    The grapheme matcher is supported. It’s equivalent to \P{M}\p{M}*.

  • Branch reset

    (?|…|…)

    Capture group numbers will be reused across the alternatives.

Project details


Release history Release notifications

History Node

2018.02.21

History Node

2018.02.08

History Node

2018.02.03

History Node

2018.01.10

History Node

2017.12.12

History Node

2017.12.09

History Node

2017.12.05

History Node

2017.11.09

History Node

2017.11.08

History Node

2017.09.23

History Node

2017.07.28

History Node

2017.07.26

History Node

2017.07.11

History Node

2017.06.23

History Node

2017.06.20

History Node

2017.06.07

History Node

2017.05.26

History Node

2017.04.29

History Node

2017.04.23

History Node

2017.04.05

History Node

2017.02.08

History Node

2017.01.17

History Node

2017.01.14

History Node

2017.01.12

History Node

2016.12.27

History Node

2016.11.21

History Node

2016.11.18

History Node

2016.10.22

History Node

2016.09.22

History Node

2016.08.27

History Node

2016.07.21

History Node

2016.07.14

History Node

2016.06.24

History Node

2016.06.19

History Node

2016.06.14

History Node

2016.06.05

History Node

2016.06.02

History Node

2016.05.23

History Node

2016.05.15

History Node

2016.05.14

History Node

2016.05.13

History Node

2016.04.25

History Node

2016.04.15

History Node

2016.04.08

History Node

2016.04.03

History Node

2016.04.02

History Node

2016.04.01

History Node

2016.03.31

History Node

2016.03.26

History Node

2016.03.24

History Node

2016.03.02

History Node

2016.02.25

History Node

2016.02.24

History Node

2016.02.23

History Node

2016.01.10

History Node

2015.11.22

History Node

2015.11.14

History Node

2015.11.12

History Node

2015.11.09

History Node

2015.11.08

History Node

2015.11.07

History Node

2015.11.05

History Node

2015.10.29

History Node

2015.10.22

History Node

2015.10.05

History Node

2015.10.01

History Node

2015.09.28

History Node

2015.09.23

History Node

2015.09.15

History Node

2015.09.14

History Node

2015.07.19

History Node

2015.07.12

History Node

2015.06.24

History Node

2015.06.21

History Node

2015.06.19

History Node

2015.06.15

History Node

2015.06.14

History Node

2015.06.10

History Node

2015.06.09

History Node

2015.06.04

History Node

2015.06.02

History Node

2015.05.28

History Node

2015.05.10

History Node

2015.05.07

History Node

2015.03.18

History Node

2014.12.24

History Node

2014.12.15

History Node

2014.11.14

History Node

2014.11.13

History Node

2014.11.03

History Node

2014.10.24

History Node

2014.10.23

History Node

2014.10.09

History Node

2014.10.07

History Node

2014.10.02

History Node

2014.10.01

History Node

2014.09.22

History Node

2014.09.18

History Node

2014.08.28

History Node

2014.08.15

History Node

2014.06.28

History Node

2014.05.23

History Node

2014.05.17

History Node

2014.04.10

History Node

2014.02.19

History Node

2014.02.16

History Node

2014.01.30

History Node

2014.01.20

History Node

2014.01.10

History Node

0.1.20130125

History Node

0.1.20130124

History Node

0.1.20130120

History Node

0.1.20121216

History Node

0.1.20121120

History Node

0.1.20121113

History Node

0.1.20121105

History Node

0.1.20121031

History Node

0.1.20121017

History Node

0.1.20121008

History Node

0.1.20120904

History Node

0.1.20120825

History Node

0.1.20120803

History Node

0.1.20120710

History Node

0.1.20120709

History Node

0.1.20120708

History Node

0.1.20120705

History Node

0.1.20120613

History Node

0.1.20120611

History Node

0.1.20120506

History Node

0.1.20120504

History Node

0.1.20120503

History Node

0.1.20120502

History Node

0.1.20120416

History Node

0.1.20120323

History Node

0.1.20120317

History Node

0.1.20120316

History Node

0.1.20120303

History Node

0.1.20120301

History Node

0.1.20120209

History Node

0.1.20120208

History Node

0.1.20120129

History Node

0.1.20120128

History Node

0.1.20120126

History Node

0.1.20120123

History Node

0.1.20120122

History Node

0.1.20120119

History Node

0.1.20120115

History Node

0.1.20120114

History Node

0.1.20120112

History Node

0.1.20120105

History Node

0.1.20120103

History Node

0.1.20111223

History Node

0.1.20111103

History Node

0.1.20111014

History Node

0.1.20111006

History Node

0.1.20111005

History Node

0.1.20111004

History Node

0.1.20110929

History Node

0.1.20110927

History Node

0.1.20110922

History Node

0.1.20110922a

History Node

0.1.20110917

History Node

0.1.20110917a

History Node

0.1.20110717

History Node

0.1.20110702

History Node

0.1.20110627

History Node

0.1.20110623

History Node

0.1.20110623a

History Node

0.1.20110616

History Node

0.1.20110610

History Node

0.1.20110609

History Node

0.1.20110608

History Node

0.1.20110608a

History Node

0.1.20110524

History Node

0.1.20110514

History Node

0.1.20110510

History Node

0.1.20110504

History Node

0.1.20110502

History Node

0.1.20110429

History Node

0.1.20110315

History Node

0.1.20110314

History Node

0.1.20110313

History Node

0.1.20110124

History Node

0.1.20110106

History Node

0.1.20110104

History Node

0.1.20101231

History Node

0.1.20101230

History Node

0.1.20101229

History Node

0.1.20101228

History Node

0.1.20101228a

History Node

0.1.20101224

History Node

0.1.20101210

History Node

0.1.20101207

History Node

0.1.20101130

History Node

0.1.20101123

History Node

0.1.20101121

History Node

0.1.20101120

History Node

0.1.20101113

History Node

0.1.20101106

History Node

0.1.20101102

History Node

0.1.20101102a

History Node

0.1.20101101

History Node

0.1.20101030

History Node

0.1.20101030b

History Node

0.1.20101029

History Node

0.1.20101009

History Node

0.1.20100918

History Node

0.1.20100913

History Node

0.1.20100912

History Node

0.1.20100824

History Node

0.1.20100816

History Node

0.1.20100814

History Node

0.1.20100725

History Node

0.1.20100719

History Node

0.1.20100709.1

This version
History Node

0.1.20100709

History Node

0.1.20100706.1

History Node

0.1.20100706

History Node

0.1.20100331

History Node

0.1.20100323

History Node

0.1.20100305

History Node

0.1.20100226

History Node

0.1.20100217

History Node

2013-12-31

History Node

2013-11-29

History Node

2013-10-26

History Node

2013-10-25

History Node

2013-10-24

History Node

2013-10-23

History Node

2013-10-22

History Node

2013-10-21

History Node

2013-10-12

History Node

2013-10-04

History Node

2013-08-04

History Node

2013-06-26

History Node

2013-06-05

History Node

2013-05-21

History Node

2013-03-11

History Node

2013-02-23

History Node

2013-02-16

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
regex-0.1.20100709.tar.gz (755.7 kB) Copy SHA256 hash SHA256 Source None Jul 9, 2010

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page