An information extraction focused regex library that uses constant-delay algorithms.
Project description
REmatch - Python version
Interfaz desarrollada para hacer uso en python de la librería de expresiones regulares REmatch
creada en c++.
Esta interfaz está adaptada para tener una sintaxis similar a la librería Re
encontrada por defecto en python, sin embargo, algunas funciones no tiene el mismo comportamiento por lo que se recomienda leer la documentación de forma detallada.
Es importante mencionar que las expresiones regulares deben estar entre .*
para ser compiladas, por ejemplo:
.*correo@!dominio{gmail}.cl.*
Uso
Para usar esta interfaz primero se debe compilar el código de fuente de la carpeta REmatchEngine utilizando SWIG/Python (revisar el README en esa carpeta para las instrucciones de compilación).
Suponiendo que ya se cuenta con rematch.py
y _rematchswiglib.so
, se deben colocar esos archivos junto a REmatch.py
en la misma carpeta. Luego de eso, se puede importar la interfaz creando un archivo .py
y agregando:
import pyrematch as re
Historial de versiones
0.1
Implementacion de las funciones:
- find
- findall
- finditer
- search
- match
- fullmatch
Contenido del modulo
REmatch. compile(pattern, flags)
Compila una expresion regular en un Regex object
, el cual puede ser usado para hacer match con los metodos que se describirán a continuacion.
Regular Expression Object
Regex. find(string)
Escanea el string de izquierda a derecha hasta encontrar la primera posición donde la expresion regular produzca match, retornando el correspondiente Match object
. Retorna None
si no se logra hacer match con el string.
>> pattern = re.compile('.*d.*')
>> pattern.find('dog') # Match at index 0
<REmatch.Match object at 0x7f374c2e2bd0>
Regex. search(string)
Mismo comportamiento de find
. Creado para mantener sintaxis con libreria Re
.
Regex. match(string)
Si cero o mas caracteres desde la primera posicion del string hacen match con la expresion regular, retorna el correspondiente match object
. Retorna None
si no se hace match con el inicio del string.
>> pattern = re.compile('!x{.*a...s}.*')
>> pattern.match('abyssal') # Match at index 0
<REmatch.Match object at 0x7fa1080fd7f0>
>> pattern.match('abyssal').group("x")
abyss
Regex. fullmatch(string)
Si la expresion regular es capaz de hacer match con todo el string, retorna el correspondiente match object
. Retorna None
en caso contrario.
>> pattern = re.compile('.*!x{a...s}.*')
>> pattern.fullmatch('abyssal')
None
>> pattern.fullmatch('abyss')
<REmatch.Match object at 0x7fa1080fd7f0>
>> pattern.fullmatch('abyss).group("x")
abyss
Regex. findall(string)
Escanea el string de iquierda a derecha encontrando todos los substring que produzcan match con la expresion regular. Retorna una lista de match object
en el orden en que fueron encontrados. En caso de no producir ningun match retorna una lista vacia.
>> pattern = re.compile('.*!x{teen}.*')
>> matches = pattern.findall('fifteen, sixteen, seventeen,...')
[<REmatch.Match object at 0x7f163ba14b10>, <REmatch.Match object at 0x7f163ba1e150>, <REmatch.Match object at 0x7f163ba2abd0>]
>>
>> for match in matches:
>> print(match.span('x'), match.group('x'))
(3, 7) teen
(12, 16) teen
(23, 27) teen
Regex. finditer(string)
Mismo comportamiento de findall
. Retorna un iterator
de match objects
en el orden que fueron encontrados. En caso de no producir ningun match retorna un iterador vacio.
>> pattern = re.compile('.*!x{teen}.*')
>> matches = pattern.finditer('fifteen, sixteen, seventeen,...')
<generator object Regex.finditer at 0x7f08c46d3850>
>>
>> for match in matches:
>> print(match.span('x'), match.group('x'))
(3, 7) teen
(12, 16) teen
(23, 27) teen
Match Objects
Notar que para todas las funcionalidades de los match objects
es necesario hacer uso de los capture
.
El capture
(especificado previamente en la compilación de la expresion regular) puede ser el nombre en formato de string o el indice (partiendo en 1) en formato de integer. La sintaxis para un capture es !capture_name{regular_expression}
.
Match. start(capture) / end(capture)
Retorna el indice del inicio/termino del substring que ha hecho match especificamente con el capture
indicado en la expresion regular.
>> pattern = re.compile('.*!var{stick}.*')
>> match = pattern.find("fantastick")
>> match.start('var')
5
>> match.end('var')
10
Match. span(capture)
Para un match m
retorna la tupla (m.start(capture), m.end(capture))
.
>> pattern = re.compile('.*!var{stick}.*')
>> match = pattern.find("fantastick")
>> match.span('var')
(5, 10)
Match. group(capture)
Retorna el substring asociado al capture
. Notar que si la flag save_anchors
está activada, estará disponible match.group(0)
que retornará el string completo con el cual se ha hecho match.
>> pattern = re.compile('.*!var1{fan}..!var2{stick}.*')
>> match = pattern.find("fantastick")
>> match.group('var1')
fan
>> match.group(1)
fan
>> match.group('var2')
stick
>> match.group(2)
stick
Match. groups(default=None)
Retorna una tupla de strings que contiene todos los grupos del match.
>> pattern = re.compile('.*!var1{fan}..!var2{stick}.*')
>> match = pattern.find("fantastick")
>> match.groups()
('fan', 'stick')
Match. groupdict(default=None)
Retorna un diccionario de todos los grupos en el match. Cada elemento tiene como key el nombre del grupo y value el string correspondiente al grupo.
>> pattern = rem.compile(".*!name{.*}, !city{.*}.*")
>> matches = pattern.findall("Erick, Santiago")
>> name_length = 0
>> city_length = 0
>> complete_match = None
>> for m in matches:
>> if len(m.group("name")) > name_length and len(m.group('city')) > city_length:
>> complete_match = m
>> complete_match.groupdict()
{'city': 'Santiago', 'name': 'Erick'}
En este ultimo ejemplo es importante notar que findall retorna todas las combinaciones posibles de las letras en las palabras que se buscan, por lo que se recorren todos los resultados hasta encontrar las de mayor largo que corresponderían al nombre completo que no interesa saber.
Ejemplos de uso
Ejemplos mas complejos una vez que esté lista la libreria...
Proximamente...
Codigo
Codigo
Codigo
Codigo
Contacto
Oscar Cárcamo | oscar.carcamoz@uc.cl |
Nicolás Van Sint Jan | nicovsj@uc.cl |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file pyrematch-0.1.2-cp39-cp39-win_amd64.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 180.2 kB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d0ae2acade4feb7bbf075ad3fad1615c96d563ef7756d6ae028c5bf05e2cf6f |
|
MD5 | 579d1d7f34eb3e2fbe5a9f799de4d743 |
|
BLAKE2b-256 | e6be85b0f4dc6a9b2bbaf93010b661756af139d767524b89082ae755b5f86a7f |
File details
Details for the file pyrematch-0.1.2-cp39-cp39-win32.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp39-cp39-win32.whl
- Upload date:
- Size: 150.5 kB
- Tags: CPython 3.9, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5383ca79ab84b1ca2ef6d48bde109b2f9f8ef524c9df2ea67d69cb5de60c2735 |
|
MD5 | 7da4e3124bd6d8f573f63020e54779e2 |
|
BLAKE2b-256 | 6ffa1038a4b294501f3059379d811a6e3a7bbe04da969113977b430a27330eb3 |
File details
Details for the file pyrematch-0.1.2-cp39-cp39-macosx_10_13_x86_64.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp39-cp39-macosx_10_13_x86_64.whl
- Upload date:
- Size: 345.3 kB
- Tags: CPython 3.9, macOS 10.13+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.9.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 658ba87482730f72a5566093fc230bcaaf17b1ae0f4138635a95e7f713dabd60 |
|
MD5 | 58dc74642a5ace6a6d5e8e553bbfd175 |
|
BLAKE2b-256 | 75a327246b2253ac8ce9593605ce8bc2a8d11b8ab2d2347b55c4ed904d68cb89 |
File details
Details for the file pyrematch-0.1.2-cp38-cp38-win_amd64.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 180.1 kB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 090b0ef3b03318645042e9fa89e3eb83e00f3bafc35908b1377f14a689534161 |
|
MD5 | 920674696382e0b8bd797e8bfb05e7da |
|
BLAKE2b-256 | e0280a4c6dd58b602bb2cead38d2815130db638f82302f7768253c9405384473 |
File details
Details for the file pyrematch-0.1.2-cp38-cp38-win32.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp38-cp38-win32.whl
- Upload date:
- Size: 150.6 kB
- Tags: CPython 3.8, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d55db17344e758f9cdda4189e41820d15dfa7014dac5caf0a040fc46448b2580 |
|
MD5 | d351d67e384521e7104a4ea73fd820bb |
|
BLAKE2b-256 | a05a5c897c50e1c726ab0a69799c8f6f5645d1718f3d193770ecfb8c419bc58e |
File details
Details for the file pyrematch-0.1.2-cp38-cp38-macosx_10_13_x86_64.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp38-cp38-macosx_10_13_x86_64.whl
- Upload date:
- Size: 345.5 kB
- Tags: CPython 3.8, macOS 10.13+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b35094afdbd8722305aa88b003c9297c1628aff3c67aa3e15f2bdbd896057401 |
|
MD5 | c6fec2bdcd46582c31aff49f8f9d6483 |
|
BLAKE2b-256 | b458adc763f5fbfa3448754fc0b0598772d276682a9bc8504c2e12101486738f |
File details
Details for the file pyrematch-0.1.2-cp37-cp37m-win_amd64.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 180.2 kB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 190792f4ae66e18041495d1bced81f49680385a5ade8e55fb757dab7196ac3f0 |
|
MD5 | 7d168a9e69c94cbc08955b23b6fd550c |
|
BLAKE2b-256 | 54aac6824bae5569c0ab6f525e9a58d5dad7e3f5bdbf91d7356766dda9f186e0 |
File details
Details for the file pyrematch-0.1.2-cp37-cp37m-win32.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp37-cp37m-win32.whl
- Upload date:
- Size: 150.7 kB
- Tags: CPython 3.7m, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | be83d1aa74163bd9219a134ac07c7edaabbb63147cc5df85120b7bbdde1ef334 |
|
MD5 | 2d9532ac85a46ac4ce74fea8f76787da |
|
BLAKE2b-256 | 6668735e3ac1aa4b3d206616931d1028d2bace6e4c748c4f7bb2fcced33c0790 |
File details
Details for the file pyrematch-0.1.2-cp37-cp37m-macosx_10_13_x86_64.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp37-cp37m-macosx_10_13_x86_64.whl
- Upload date:
- Size: 345.4 kB
- Tags: CPython 3.7m, macOS 10.13+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85be82278f7017632475a7f042d27dd7cf0bf9cae4978bd05a29c4c90a492e30 |
|
MD5 | 36aeddab17268ac1c05fb8bf843e6c61 |
|
BLAKE2b-256 | d0eaf624eeedb8f06cf2654f35574bb006e27f2adb1cbbb352acd40e1896cc1b |
File details
Details for the file pyrematch-0.1.2-cp36-cp36m-win_amd64.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp36-cp36m-win_amd64.whl
- Upload date:
- Size: 180.2 kB
- Tags: CPython 3.6m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | efaa36eaa14916a07b88972fd3ce9efac7249e572196e16dfd4855b27b3d3203 |
|
MD5 | 7f64d7b79072a2a07d7c4635b4bc464d |
|
BLAKE2b-256 | 9271f6fd702f432df54f7023f0c233f978e3a7ade91a79c46ffc2cd86ff73384 |
File details
Details for the file pyrematch-0.1.2-cp36-cp36m-win32.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp36-cp36m-win32.whl
- Upload date:
- Size: 150.5 kB
- Tags: CPython 3.6m, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84bc60fa6a2a9e0d40b091a3ec1525dffe1ebf9e3a2513592043280338205580 |
|
MD5 | 434b7b8663aa2144c202a37715b64cee |
|
BLAKE2b-256 | d3cf1e79ff26f2b4bbbdbd77ca18cef7e0c7f630f2021835dccebbfeacfe62b5 |
File details
Details for the file pyrematch-0.1.2-cp36-cp36m-macosx_10_13_x86_64.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp36-cp36m-macosx_10_13_x86_64.whl
- Upload date:
- Size: 345.4 kB
- Tags: CPython 3.6m, macOS 10.13+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b80738afdf607b02e92de370cade92033d2423abdc5ed8e277b079ddfdc4aeb |
|
MD5 | 546f0342a9e5cc657eb153e7bb0544f0 |
|
BLAKE2b-256 | 9fb40152e8d890740ddbc13ccecd033d785e46bd0d6c869cd1dbe5352a10de4a |
File details
Details for the file pyrematch-0.1.2-cp35-cp35m-win_amd64.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp35-cp35m-win_amd64.whl
- Upload date:
- Size: 180.1 kB
- Tags: CPython 3.5m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.5.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bf6125bc4720a393bd03dcc3ab4db534ea155bff64c4229a1e031a75e4922ef |
|
MD5 | b2f4c212df80ebd68506b4ee2736de56 |
|
BLAKE2b-256 | 0854aa2ff32ebf0a5045d8db2d314dfa0dd26be61ad68d12a8e6fb7eb421ada0 |
File details
Details for the file pyrematch-0.1.2-cp35-cp35m-win32.whl
.
File metadata
- Download URL: pyrematch-0.1.2-cp35-cp35m-win32.whl
- Upload date:
- Size: 150.5 kB
- Tags: CPython 3.5m, Windows x86
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.5.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d1fd26de1d12bf3a021cd6e3aae287727c7991aaaa88c659bdadd3e7871e3ad |
|
MD5 | befaecccfe8bbe12faaf2a775fdeceec |
|
BLAKE2b-256 | 3e1bad156fea2b53732f3825770d14fe9e1dfe8fcefebdae6fdb872827fa4a9c |