Skip to main content

An information extraction focused regex library that uses constant-delay algorithms.

Project description

REmatch - Python version

Interfaz desarrollada para hacer uso en python de la librería de expresiones regulares REmatch creada en c++.

Esta interfaz está adaptada para tener una sintaxis similar a la librería Re encontrada por defecto en python, sin embargo, algunas funciones no tiene el mismo comportamiento por lo que se recomienda leer la documentación de forma detallada.

Es importante mencionar que las expresiones regulares deben estar entre .* para ser compiladas, por ejemplo:

.*correo@!dominio{gmail}.cl.*

Uso

Para usar esta interfaz primero se debe compilar el código de fuente de la carpeta REmatchEngine utilizando SWIG/Python (revisar el README en esa carpeta para las instrucciones de compilación).

Suponiendo que ya se cuenta con rematch.py y _rematchswiglib.so, se deben colocar esos archivos junto a REmatch.py en la misma carpeta. Luego de eso, se puede importar la interfaz creando un archivo .py y agregando:

import pyrematch as re

Historial de versiones

0.1

Implementacion de las funciones:
- find
- findall
- finditer
- search
- match
- fullmatch

Contenido del modulo

REmatch. compile(pattern, flags)

Compila una expresion regular en un Regex object, el cual puede ser usado para hacer match con los metodos que se describirán a continuacion.

Regular Expression Object

Regex. find(string)

Escanea el string de izquierda a derecha hasta encontrar la primera posición donde la expresion regular produzca match, retornando el correspondiente Match object. Retorna None si no se logra hacer match con el string.

>> pattern = re.compile('.*d.*')
>> pattern.find('dog') # Match at index 0
<REmatch.Match object at 0x7f374c2e2bd0>

Regex. search(string)

Mismo comportamiento de find. Creado para mantener sintaxis con libreria Re.

Regex. match(string)

Si cero o mas caracteres desde la primera posicion del string hacen match con la expresion regular, retorna el correspondiente match object. Retorna None si no se hace match con el inicio del string.

>> pattern = re.compile('!x{.*a...s}.*')
>> pattern.match('abyssal') # Match at index 0
<REmatch.Match object at 0x7fa1080fd7f0>
>> pattern.match('abyssal').group("x")
abyss

Regex. fullmatch(string)

Si la expresion regular es capaz de hacer match con todo el string, retorna el correspondiente match object. Retorna None en caso contrario.

>> pattern = re.compile('.*!x{a...s}.*')
>> pattern.fullmatch('abyssal')
None
>> pattern.fullmatch('abyss')
<REmatch.Match object at 0x7fa1080fd7f0>
>> pattern.fullmatch('abyss).group("x")
abyss

Regex. findall(string)

Escanea el string de iquierda a derecha encontrando todos los substring que produzcan match con la expresion regular. Retorna una lista de match object en el orden en que fueron encontrados. En caso de no producir ningun match retorna una lista vacia.

>> pattern = re.compile('.*!x{teen}.*')
>> matches = pattern.findall('fifteen, sixteen, seventeen,...')
[<REmatch.Match object at 0x7f163ba14b10>, <REmatch.Match object at 0x7f163ba1e150>, <REmatch.Match object at 0x7f163ba2abd0>]
>>
>> for match in matches:
>>     print(match.span('x'), match.group('x'))
(3, 7) teen
(12, 16) teen
(23, 27) teen

Regex. finditer(string)

Mismo comportamiento de findall. Retorna un iterator de match objects en el orden que fueron encontrados. En caso de no producir ningun match retorna un iterador vacio.

>> pattern = re.compile('.*!x{teen}.*')
>> matches = pattern.finditer('fifteen, sixteen, seventeen,...')
<generator object Regex.finditer at 0x7f08c46d3850>
>>
>> for match in matches:
>>     print(match.span('x'), match.group('x'))
(3, 7) teen
(12, 16) teen
(23, 27) teen

Match Objects

Notar que para todas las funcionalidades de los match objects es necesario hacer uso de los capture.

El capture (especificado previamente en la compilación de la expresion regular) puede ser el nombre en formato de string o el indice (partiendo en 1) en formato de integer. La sintaxis para un capture es !capture_name{regular_expression}.

Match. start(capture) / end(capture)

Retorna el indice del inicio/termino del substring que ha hecho match especificamente con el capture indicado en la expresion regular.

>> pattern = re.compile('.*!var{stick}.*')
>> match = pattern.find("fantastick")
>> match.start('var')
5
>> match.end('var')
10

Match. span(capture)

Para un match m retorna la tupla (m.start(capture), m.end(capture)).

>> pattern = re.compile('.*!var{stick}.*')
>> match = pattern.find("fantastick")
>> match.span('var')
(5, 10)

Match. group(capture)

Retorna el substring asociado al capture. Notar que si la flag save_anchors está activada, estará disponible match.group(0) que retornará el string completo con el cual se ha hecho match.

>> pattern = re.compile('.*!var1{fan}..!var2{stick}.*')
>> match = pattern.find("fantastick")
>> match.group('var1')
fan
>> match.group(1)
fan
>> match.group('var2')
stick
>> match.group(2)
stick

Match. groups(default=None)

Retorna una tupla de strings que contiene todos los grupos del match.

>> pattern = re.compile('.*!var1{fan}..!var2{stick}.*')
>> match = pattern.find("fantastick")
>> match.groups()
('fan', 'stick')

Match. groupdict(default=None)

Retorna un diccionario de todos los grupos en el match. Cada elemento tiene como key el nombre del grupo y value el string correspondiente al grupo.

>> pattern = rem.compile(".*!name{.*}, !city{.*}.*")
>> matches = pattern.findall("Erick, Santiago")
>> name_length = 0
>> city_length = 0
>> complete_match = None
>> for m in matches:
>>   if len(m.group("name")) > name_length and len(m.group('city')) > city_length:
>>     complete_match = m
>> complete_match.groupdict()
{'city': 'Santiago', 'name': 'Erick'}

En este ultimo ejemplo es importante notar que findall retorna todas las combinaciones posibles de las letras en las palabras que se buscan, por lo que se recorren todos los resultados hasta encontrar las de mayor largo que corresponderían al nombre completo que no interesa saber.

Ejemplos de uso

Ejemplos mas complejos una vez que esté lista la libreria...

Proximamente...
Codigo
Codigo
Codigo
Codigo

Contacto

Oscar Cárcamo oscar.carcamoz@uc.cl
Nicolás Van Sint Jan nicovsj@uc.cl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pyrematch-0.1.2-cp39-cp39-win_amd64.whl (180.2 kB view details)

Uploaded CPython 3.9 Windows x86-64

pyrematch-0.1.2-cp39-cp39-win32.whl (150.5 kB view details)

Uploaded CPython 3.9 Windows x86

pyrematch-0.1.2-cp39-cp39-macosx_10_13_x86_64.whl (345.3 kB view details)

Uploaded CPython 3.9 macOS 10.13+ x86-64

pyrematch-0.1.2-cp38-cp38-win_amd64.whl (180.1 kB view details)

Uploaded CPython 3.8 Windows x86-64

pyrematch-0.1.2-cp38-cp38-win32.whl (150.6 kB view details)

Uploaded CPython 3.8 Windows x86

pyrematch-0.1.2-cp38-cp38-macosx_10_13_x86_64.whl (345.5 kB view details)

Uploaded CPython 3.8 macOS 10.13+ x86-64

pyrematch-0.1.2-cp37-cp37m-win_amd64.whl (180.2 kB view details)

Uploaded CPython 3.7m Windows x86-64

pyrematch-0.1.2-cp37-cp37m-win32.whl (150.7 kB view details)

Uploaded CPython 3.7m Windows x86

pyrematch-0.1.2-cp37-cp37m-macosx_10_13_x86_64.whl (345.4 kB view details)

Uploaded CPython 3.7m macOS 10.13+ x86-64

pyrematch-0.1.2-cp36-cp36m-win_amd64.whl (180.2 kB view details)

Uploaded CPython 3.6m Windows x86-64

pyrematch-0.1.2-cp36-cp36m-win32.whl (150.5 kB view details)

Uploaded CPython 3.6m Windows x86

pyrematch-0.1.2-cp36-cp36m-macosx_10_13_x86_64.whl (345.4 kB view details)

Uploaded CPython 3.6m macOS 10.13+ x86-64

pyrematch-0.1.2-cp35-cp35m-win_amd64.whl (180.1 kB view details)

Uploaded CPython 3.5m Windows x86-64

pyrematch-0.1.2-cp35-cp35m-win32.whl (150.5 kB view details)

Uploaded CPython 3.5m Windows x86

File details

Details for the file pyrematch-0.1.2-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 180.2 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for pyrematch-0.1.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 4d0ae2acade4feb7bbf075ad3fad1615c96d563ef7756d6ae028c5bf05e2cf6f
MD5 579d1d7f34eb3e2fbe5a9f799de4d743
BLAKE2b-256 e6be85b0f4dc6a9b2bbaf93010b661756af139d767524b89082ae755b5f86a7f

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp39-cp39-win32.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp39-cp39-win32.whl
  • Upload date:
  • Size: 150.5 kB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for pyrematch-0.1.2-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 5383ca79ab84b1ca2ef6d48bde109b2f9f8ef524c9df2ea67d69cb5de60c2735
MD5 7da4e3124bd6d8f573f63020e54779e2
BLAKE2b-256 6ffa1038a4b294501f3059379d811a6e3a7bbe04da969113977b430a27330eb3

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp39-cp39-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp39-cp39-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 345.3 kB
  • Tags: CPython 3.9, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0

File hashes

Hashes for pyrematch-0.1.2-cp39-cp39-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 658ba87482730f72a5566093fc230bcaaf17b1ae0f4138635a95e7f713dabd60
MD5 58dc74642a5ace6a6d5e8e553bbfd175
BLAKE2b-256 75a327246b2253ac8ce9593605ce8bc2a8d11b8ab2d2347b55c4ed904d68cb89

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 180.1 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for pyrematch-0.1.2-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 090b0ef3b03318645042e9fa89e3eb83e00f3bafc35908b1377f14a689534161
MD5 920674696382e0b8bd797e8bfb05e7da
BLAKE2b-256 e0280a4c6dd58b602bb2cead38d2815130db638f82302f7768253c9405384473

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp38-cp38-win32.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp38-cp38-win32.whl
  • Upload date:
  • Size: 150.6 kB
  • Tags: CPython 3.8, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for pyrematch-0.1.2-cp38-cp38-win32.whl
Algorithm Hash digest
SHA256 d55db17344e758f9cdda4189e41820d15dfa7014dac5caf0a040fc46448b2580
MD5 d351d67e384521e7104a4ea73fd820bb
BLAKE2b-256 a05a5c897c50e1c726ab0a69799c8f6f5645d1718f3d193770ecfb8c419bc58e

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp38-cp38-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp38-cp38-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 345.5 kB
  • Tags: CPython 3.8, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6

File hashes

Hashes for pyrematch-0.1.2-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 b35094afdbd8722305aa88b003c9297c1628aff3c67aa3e15f2bdbd896057401
MD5 c6fec2bdcd46582c31aff49f8f9d6483
BLAKE2b-256 b458adc763f5fbfa3448754fc0b0598772d276682a9bc8504c2e12101486738f

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 180.2 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.9

File hashes

Hashes for pyrematch-0.1.2-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 190792f4ae66e18041495d1bced81f49680385a5ade8e55fb757dab7196ac3f0
MD5 7d168a9e69c94cbc08955b23b6fd550c
BLAKE2b-256 54aac6824bae5569c0ab6f525e9a58d5dad7e3f5bdbf91d7356766dda9f186e0

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp37-cp37m-win32.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp37-cp37m-win32.whl
  • Upload date:
  • Size: 150.7 kB
  • Tags: CPython 3.7m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.9

File hashes

Hashes for pyrematch-0.1.2-cp37-cp37m-win32.whl
Algorithm Hash digest
SHA256 be83d1aa74163bd9219a134ac07c7edaabbb63147cc5df85120b7bbdde1ef334
MD5 2d9532ac85a46ac4ce74fea8f76787da
BLAKE2b-256 6668735e3ac1aa4b3d206616931d1028d2bace6e4c748c4f7bb2fcced33c0790

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp37-cp37m-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp37-cp37m-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 345.4 kB
  • Tags: CPython 3.7m, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.9

File hashes

Hashes for pyrematch-0.1.2-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 85be82278f7017632475a7f042d27dd7cf0bf9cae4978bd05a29c4c90a492e30
MD5 36aeddab17268ac1c05fb8bf843e6c61
BLAKE2b-256 d0eaf624eeedb8f06cf2654f35574bb006e27f2adb1cbbb352acd40e1896cc1b

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 180.2 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.8

File hashes

Hashes for pyrematch-0.1.2-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 efaa36eaa14916a07b88972fd3ce9efac7249e572196e16dfd4855b27b3d3203
MD5 7f64d7b79072a2a07d7c4635b4bc464d
BLAKE2b-256 9271f6fd702f432df54f7023f0c233f978e3a7ade91a79c46ffc2cd86ff73384

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp36-cp36m-win32.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp36-cp36m-win32.whl
  • Upload date:
  • Size: 150.5 kB
  • Tags: CPython 3.6m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.8

File hashes

Hashes for pyrematch-0.1.2-cp36-cp36m-win32.whl
Algorithm Hash digest
SHA256 84bc60fa6a2a9e0d40b091a3ec1525dffe1ebf9e3a2513592043280338205580
MD5 434b7b8663aa2144c202a37715b64cee
BLAKE2b-256 d3cf1e79ff26f2b4bbbdbd77ca18cef7e0c7f630f2021835dccebbfeacfe62b5

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp36-cp36m-macosx_10_13_x86_64.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp36-cp36m-macosx_10_13_x86_64.whl
  • Upload date:
  • Size: 345.4 kB
  • Tags: CPython 3.6m, macOS 10.13+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.8

File hashes

Hashes for pyrematch-0.1.2-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 5b80738afdf607b02e92de370cade92033d2423abdc5ed8e277b079ddfdc4aeb
MD5 546f0342a9e5cc657eb153e7bb0544f0
BLAKE2b-256 9fb40152e8d890740ddbc13ccecd033d785e46bd0d6c869cd1dbe5352a10de4a

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp35-cp35m-win_amd64.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp35-cp35m-win_amd64.whl
  • Upload date:
  • Size: 180.1 kB
  • Tags: CPython 3.5m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.5.4

File hashes

Hashes for pyrematch-0.1.2-cp35-cp35m-win_amd64.whl
Algorithm Hash digest
SHA256 7bf6125bc4720a393bd03dcc3ab4db534ea155bff64c4229a1e031a75e4922ef
MD5 b2f4c212df80ebd68506b4ee2736de56
BLAKE2b-256 0854aa2ff32ebf0a5045d8db2d314dfa0dd26be61ad68d12a8e6fb7eb421ada0

See more details on using hashes here.

File details

Details for the file pyrematch-0.1.2-cp35-cp35m-win32.whl.

File metadata

  • Download URL: pyrematch-0.1.2-cp35-cp35m-win32.whl
  • Upload date:
  • Size: 150.5 kB
  • Tags: CPython 3.5m, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.5.4

File hashes

Hashes for pyrematch-0.1.2-cp35-cp35m-win32.whl
Algorithm Hash digest
SHA256 4d1fd26de1d12bf3a021cd6e3aae287727c7991aaaa88c659bdadd3e7871e3ad
MD5 befaecccfe8bbe12faaf2a775fdeceec
BLAKE2b-256 3e1bad156fea2b53732f3825770d14fe9e1dfe8fcefebdae6fdb872827fa4a9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page