Skip to main content

Apply multiple regex patterns and keep change index map.

Project description

Build Status PyPI Codacy Badge

re-map

Package for managing multiple regext pattern replacement change location map.

May be usefull when there is a necesity to have original text, text altered using regex pattern replacements and a map of said replacements. One scenario where this may proove usefull is machine learning "text2text# problems (e.g. translation, text normalization, etc.).

install

$ pip install re-map

example

Code

from re_map import Processor

numbers = {5: 'five', 8: 'eight', 10: 'ten'}
orginal_numbers = {1: 'first', 2: 'second'}

modifiers = [
    ( r'der (G\.) Be',  { 1: 'Graham'} ),
    ( r' (&) ',  { 1: 'and'} ),
    ( r' (etc)\.',  { 1: 'et cetera'} ),
    ( r' ((\d+)((st)|(nd)|(rd)|(th))) ',  { 2: lambda x: orginal_numbers[int(x)], 3: '' } ),
    ( r' (\d+) ',  { 1: lambda x: numbers[int(x)] } ),
]

text = 'Alexander G. Bell ate 10 apples & 8 cucumbers. The 1st apple was rotten, the 2nd was too, also the third, fourth etc.'

with Processor(text) as procesor:
    for pattern, replacement_map in modifiers:
        procesor.process(pattern, replacement_map)

decorated_text, decorated_processed_text = procesor.decorate()

print (text)
print (decorated_text)
print (procesor.processed_text)
print (decorated_processed_text)
print (procesor.span_map)

Output

Alexander G. Bell ate 10 apples & 8 cucumbers. The 1st apple was rotten, the 2nd was too, also the third, fourth etc.
Alexander 00 Bell ate 11 apples 2 3 cucumbers. The 455 apple was rotten, the 677 was too, also the third, fourth 888.
Alexander Graham Bell ate ten apples and eight cucumbers. The first apple was rotten, the second was too, also the third, fourth et cetera.
Alexander 000000 Bell ate 111 apples 222 33333 cucumbers. The 44444 apple was rotten, the 666666 was too, also the third, fourth 888888888.
[((10, 12), (10, 16)), ((22, 24), (26, 29)), ((32, 33), (37, 40)), ((34, 35), (41, 46)), ((51, 52), (62, 67)), ((52, 54), (67, 67)), ((77, 78), (90, 96)), ((78, 80), (96, 96)), ((113, 116), (129, 138))]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

re_map-0.4.6.tar.gz (7.3 kB view details)

Uploaded Source

File details

Details for the file re_map-0.4.6.tar.gz.

File metadata

  • Download URL: re_map-0.4.6.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.7

File hashes

Hashes for re_map-0.4.6.tar.gz
Algorithm Hash digest
SHA256 4fa2fdc5c5fb127ced9fd82d636008901f63d84799afd0c231e7a656ceef1a58
MD5 eb660afac50574a74e192644e732c56f
BLAKE2b-256 2b166435f738286e5a119d72dd2a803c3e7c21ba577442dbac6f575858a013c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page