Skip to main content

Text processing tool for detecting Danish CPR-numbers.

Project description

os2ds-rules: Next-generation, high-performance rule system for OS2datascanner

os2ds-rules is the next-generation rules system for use in OS2datascanner by Magenta ApS, which aimes to deliver high-performance with regards to processing speed and detection accuracy.

The project consists of several components:

  • A shared-library backend written in modern C++20.
  • A Python C/C++ extension that exposes functionality from the backend to python.
  • A python library that provides a safe, easy-to-use interface to the aforementioned extension.

WARNING: This is currently in a very early stage of development and frequently undergoes substantial changes, so it is not ready for prime time, yet.

Getting Started

You can install either or both of the C++ library or the python extension.

C++ backend library: libos2dsrules

For the C++ library you need a few different things:

  • A compiler that supports C++20. We recommend using either g++ (GCC) or clang (LLVM).
  • cmake>=3.20: Primary (meta) build system.
  • ninja: Cross-platform backend for cmake.
  • gtest (Google Test): For building and running the test suite.

For development, you additionally want:

  • cppcheck or clang-analyzer: For static code analysis.
  • clang-format: For formatting C++ code. We adhere to the LLVM style guide.
  • clang-tidy: For C++ code linting.
  • gdb or lldb: A suitable debugger.

To make a debug build on linux, run the following:

# Make a build directory 
cmake . --preset linux-debug
cmake --build --preset linux-build-debug

This will build the shared library libos2dsrules.so and the test suite testsuite.

To install the library, run the following from the build directory:

sudo cmake --install build_cmake/debug

By default, this will install headers into /usr/include and shared objects to /usr/lib.

To run the test suite:

ctest --preset linux-test-debug

Currently, this has only been tested on linux. It remains to be tested on windows and macos.

Using CMake preset workflows

There are four preconfigured workflows that has been automated with cmake:

  • linux-debug-workflow: Configures, Builds and Tests the debug version of the library for linux.
  • linux-release-workflow: Configures, Builds and Packages the release version of the library for linux.
  • windows-debug-workflow: Configures, Builds and Tests the debug version of the library for windows.
  • windows-release-workflow: Configures, Builds and Packages the release version of the library for windows.

For example, to run a debugging workflow on linux, use:

cmake --workflow --preset linux-debug-workflow

The Python extension: os2ds-rules

You need the following:

  • A compiler that supports C++20. We recommend using either g++ (GCC) or clang (LLVM).
  • The CPython development headers and libraries.
  • setuptools: For building the extension.
  • pytest: For python tests.
  • pytest-benchmark: For python benchmarks.

NOTE: Depending on what OS you use and how CPython was installed on your system, the development headers and libraries may or may not be installed.

The development headers and libraries can be installed using a package manager on the following systems:

  • ubuntu/debian: sudo apt install python3-dev
  • fedora: sudo dnf install python3-devel

To build the extension:

# From the project root.
python3 -m setup build

To install the extension locally:

# From the project root.
# You may want to use the `-e` option during development.
python3 -m pip install .

Uninstalling is as easy as running:

pip uninstall os2ds-rules

Running the benchmark

After having installed the extension as described above, run the benchmarks with:

python3 -m pytest --benchmark-only test/benchmarks/

Currently, you need to build and install the extension before running the benchmark until this gets fixed.

Python Interpreter support

The Python3 extension uses the CPython C-API, which is supported by CPython as standard.

We aim to support the PyPy interpreter as well.

Usage Examples

In Python

Let us scan a python str for occurances for CPR-Numbers:

from os2ds_rules import CPRDetector


detector = CPRDetector()

matches = detector.find_matches('This is a fake, but valid CPR-Number: 1111111118')

for m in matches:
	print(m)

In C++

Consider this simple file, test.cpp:

#include <os2dsrules.hpp>
#include <name_rule.hpp>

#include <iostream>
#include <string>

using namespace OS2DSRules::NameRule;

int main(void) {
    NameRule rule;
    std::string s = "This is my friend, John.";
    auto results = rule.find_matches(s);

    std::cout << "Found matches: \n";
    for (auto m : results) {
        std::cout << m.match() << '\n';
    }

    return 0;
}

To compile, using clang, simply run:

clang++ -los2dsrules -std=c++20 test.cpp -o test 

This will produce an executable test in the current working directory.

Running it:

$ ./test
Found matches:
John

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

os2ds-rules-0.1.0.tar.gz (25.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

os2ds_rules-0.1.0-pp39-pypy39_pp73-win_amd64.whl (1.2 MB view details)

Uploaded PyPyWindows x86-64

os2ds_rules-0.1.0-pp39-pypy39_pp73-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (1.7 MB view details)

Uploaded PyPymanylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

os2ds_rules-0.1.0-cp311-cp311-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.11Windows x86-64

os2ds_rules-0.1.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

os2ds_rules-0.1.0-cp310-cp310-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.10Windows x86-64

os2ds_rules-0.1.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

os2ds_rules-0.1.0-cp39-cp39-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.9Windows x86-64

os2ds_rules-0.1.0-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl (2.3 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.24+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file os2ds-rules-0.1.0.tar.gz.

File metadata

  • Download URL: os2ds-rules-0.1.0.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for os2ds-rules-0.1.0.tar.gz
Algorithm Hash digest
SHA256 32fcb9e16b04b8761878c4d9654ae68ab09776c5df720632e51af2b734dbbfd8
MD5 14b5b33576856de45e878dd7006c8f84
BLAKE2b-256 42eb45594759d6007292369f283819556e1a4c7575bff62abd9ff63f79168a42

See more details on using hashes here.

File details

Details for the file os2ds_rules-0.1.0-pp39-pypy39_pp73-win_amd64.whl.

File metadata

File hashes

Hashes for os2ds_rules-0.1.0-pp39-pypy39_pp73-win_amd64.whl
Algorithm Hash digest
SHA256 7df797377bc0d832bced852d293948c0b2ea605441fa89eccb20b12128331e4c
MD5 7cf13b4899b91f55c3fa9dfda0aa16f8
BLAKE2b-256 8aeb63c9a9469f72eadcd79c987849e49e2dbeec80154a2d0ea8e626ae57ea97

See more details on using hashes here.

File details

Details for the file os2ds_rules-0.1.0-pp39-pypy39_pp73-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for os2ds_rules-0.1.0-pp39-pypy39_pp73-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 75be425ef250aa08dca22a8f9b29697a6f685b0fe6f66a7a25952a357e4d9289
MD5 d13389ed435643a8da3514f0207ce27c
BLAKE2b-256 c48d7caf1af57c5a9b4afa676d536293a6505e5c48718bab1516dbae4053e0a2

See more details on using hashes here.

File details

Details for the file os2ds_rules-0.1.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for os2ds_rules-0.1.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 d53a17f66fe1374789514360ab7bb4cec83aec8a31e9477cc2b651bdae79a2ef
MD5 cb7d2038c0bdb08cb5c4f844c60817a4
BLAKE2b-256 c7f7bdce987e000e60d5dbb27bb8a922ee9b45757647d0b9c2612bb1f8594012

See more details on using hashes here.

File details

Details for the file os2ds_rules-0.1.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for os2ds_rules-0.1.0-cp311-cp311-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dc56412a44193ccf491c2d13520bbf17c5b3614af0a54abd054d84197b3a816a
MD5 2e19948b6feda4be0d68bc7a3276e0ae
BLAKE2b-256 8c1c9dc3a99af9a711b58e42fb2bfc8b6311d80bd44769620abd5ec3042e27b0

See more details on using hashes here.

File details

Details for the file os2ds_rules-0.1.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for os2ds_rules-0.1.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 162cc3770de675ed580f9d486ac2987237f655481cc35cfaf3ae823d04017088
MD5 334ed4b84e7e3ed59ad655ff9886e540
BLAKE2b-256 ed6959159a2dd5d90ea16f16865a0c8b2bb3bf6feb47467b11d3b5aaebd076e0

See more details on using hashes here.

File details

Details for the file os2ds_rules-0.1.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for os2ds_rules-0.1.0-cp310-cp310-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 db761f480a72bccabe8a72640d8c863742d3f9a7ce03d153f413b7b83136393f
MD5 14f1cfe77e3f752269d638463d8c0816
BLAKE2b-256 84198c6059d2ec22acc6d6285dd9c3a6deffce83d195ee01c3699e98dd60f779

See more details on using hashes here.

File details

Details for the file os2ds_rules-0.1.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: os2ds_rules-0.1.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for os2ds_rules-0.1.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 15aef1457b2cc01d498b07b205ccafddeefd5c8a91b30508d867c37bdd20ebc4
MD5 f6db5cc5bd34334c6125ae14f5b20548
BLAKE2b-256 0f81bc1f2fac914db9bccc2db57d6fd26346db06a5c7e1f37c5be6b2af57477c

See more details on using hashes here.

File details

Details for the file os2ds_rules-0.1.0-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for os2ds_rules-0.1.0-cp39-cp39-manylinux_2_24_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 dc9d53c1e7c58dc0d8839586b9e6753c2024d911b82e2222757b4c3ad7c3079e
MD5 c8075e0b817353e44dc4601b0b67a1e9
BLAKE2b-256 b5a3cd06e03e16753e60c495a7ef3ab91027d978691f7767b0dd5ecb55f578e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page