Skip to main content

An information extraction focused regex library that uses constant-delay algorithms.

Project description

Implementation of constant delay algorithm for regular document spanners

This implementation is based on the paper Constant delay algorithms for regular document spanners by Fernando Florenzano, Cristian Riveros, Martín Ugarte, Stijn Vansummeren and Domagoj Vrgoč.

Directory structure

The C++ implementation is under /src folder.

The /exp folder contains different experiments to compare our library with others.

The /tests folder contains all the automatic tests for our code.

Build instructions

Python/SWIG

Assuming that you are in a Debian-based distro, first install the following dependencies:

sudo apt install g++ cmake swig libboost-dev python3-dev

After that, in this directory, run:

mkdir -pv build && cd build
cmake -DSWIG=true ..
make

After the compilation process there will be a rematch.py (the bindings interface) and a _rematchswiglib.so (the shared lib binary) in build/bin/SWIG that you can use for interfacing REmatch via Python.

CLI tool

cmake -H. -Bbuild/Release
cmake --build build/Release

If you want to use a debugger such as gdb, then you should add -DCMAKE_BUILD_TYPE=Debug in the first CMake command.

Command line use

After building, the binary file will be located in the build/Release/bin folder. To try it simply run:

build/Release/bin/rematch --help

Examples:

Get all spans corresponding to a single letter a:

build/Release/bin/rematch -d document.txt -e '.*!x{a}.*'

Get all spans corresponding to a pattern in a file:

build/Release/bin/rematch -d document.txt -r regex.txt

Get benchmark stats (execution time, number of outputs, memory usage, etc.):

build/Release/bin/rematch -d document.txt -r regex.txt -o benchmark

Testing

We are using Boost.Test for unit testing.

To add more tests, add a new folder inside tests/[test_name_folder]/ that starts with the word test as a prefix. Follow the same structure (same file names) of the other folders.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pyrematch-0.1.0-cp39-cp39-win_amd64.whl (178.2 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

pyrematch-0.1.0-cp39-cp39-win32.whl (148.0 kB view hashes)

Uploaded CPython 3.9 Windows x86

pyrematch-0.1.0-cp38-cp38-win_amd64.whl (178.5 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

pyrematch-0.1.0-cp38-cp38-win32.whl (148.1 kB view hashes)

Uploaded CPython 3.8 Windows x86

pyrematch-0.1.0-cp37-cp37m-win_amd64.whl (178.5 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

pyrematch-0.1.0-cp37-cp37m-win32.whl (148.1 kB view hashes)

Uploaded CPython 3.7m Windows x86

pyrematch-0.1.0-cp36-cp36m-win_amd64.whl (178.7 kB view hashes)

Uploaded CPython 3.6m Windows x86-64

pyrematch-0.1.0-cp36-cp36m-win32.whl (148.1 kB view hashes)

Uploaded CPython 3.6m Windows x86

pyrematch-0.1.0-cp35-cp35m-win_amd64.whl (178.2 kB view hashes)

Uploaded CPython 3.5m Windows x86-64

pyrematch-0.1.0-cp35-cp35m-win32.whl (147.9 kB view hashes)

Uploaded CPython 3.5m Windows x86

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page