Skip to main content

Framework for build NLP information extraction systems using regular expressions.

Project description

konsepy

Framework for build NLP information extraction systems using regular expressions. konsepy then enables leveraging the NLP system to create a silver standard for fine-tuning a transformer model.

Installation

  • konsepy is designed to be used with the knosepy_nlp_template
    • See the README there for current installation instructions.
  • To use konsepy as a standalone entity:
    • Install with pip:
      • pip install konsepy[all]
      • For sentence-splitting corpora from fine-tuning a sentence based transformer, spacy will also need to be installed and configured.

Usage

The package provides a centralized CLI tool konsepy.

Building your NLP Package

To use konsepy, you need to create an NLP package (e.g., my_nlp_package) with the following structure. The best way to get this format is to clone the konsepy_nlp_template:

my_nlp_package/
├── __init__.py
└── concepts/
    ├── __init__.py
    └── my_concept.py

Each concept file (e.g., my_concept.py) must define:

  • REGEXES: A list of regex-category pairs (and optional context functions).
  • RUN_REGEXES_FUNC: A function that executes the regexes and returns categories/matches.
  • CategoryEnum: An Enum defining the possible categories for the concept.

Search Functions

konsepy provides several pre-built search functions in konsepy.rxsearch:

Some simple ones:

  • search_all_regex: Finds all occurrences of each regex in the list.
  • search_first_regex: Finds only the first occurrence of each regex.

Probably the most useful:

  • search_and_replace_regex_func: Prevents double-matching by replacing found text with dots before proceeding to the next regex.
  • search_all_regex_func: Supports "sentinel" values (None) to stop processing if a match was found earlier.

Example of my_concept.py:

import re
from enum import Enum
from konsepy.rxsearch import search_all_regex_func
from konsepy.context.negation import check_if_negated
from konsepy.context.other_subject import check_if_other_subject


class CategoryEnum(Enum):
  MENTION = 1
  NO = 0
  OTHER = 3


REGEXES = [
  (re.compile(r'my pattern', re.I),
   CategoryEnum.MENTION,
   [
     lambda **kwargs: check_if_negated(neg_concept=CategoryEnum.NO, **kwargs),
     lambda **kwargs: check_if_other_subject(other_concept=CategoryEnum.OTHER, **kwargs),
   ]
   ),
]

# word_window specifies the number of words to retrieve for context functions (instead of character):
RUN_REGEXES_FUNC = search_all_regex_func(REGEXES, word_window=5)
# to alter the character-based window:
RUN_REGEXES_FUNC = search_all_regex_func(REGEXES, window=50)  # defaults to 30

Custom Search Functions

You can create your own search function by defining a function that returns a generator:

def my_custom_search(regexes):
    def _search(text, include_match=False):
        for regex, category, *other in regexes:
            for m in regex.finditer(text):
                yield (category, m) if include_match else category
    return _search

Running konsepy

# Run all concepts in a package against input files
konsepy run-all --package-name my_nlp_package --input-files data.csv --outdir output/

# Extract snippets for manual review
konsepy run4snippets --package-name my_nlp_package --input-files data.csv --outdir snippets/

# Generate BIO tagged data for model training
konsepy bio-tag --package-name my_nlp_package --input-files data.csv --outdir bio_data/

For more detailed documentation and a template, see konsepy_nlp_template.

Roadmap

  • Change labels to some metadata object to allow more diverse input sources and run info

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

konsepy-0.3.2.tar.gz (30.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

konsepy-0.3.2-py3-none-any.whl (34.8 kB view details)

Uploaded Python 3

File details

Details for the file konsepy-0.3.2.tar.gz.

File metadata

  • Download URL: konsepy-0.3.2.tar.gz
  • Upload date:
  • Size: 30.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for konsepy-0.3.2.tar.gz
Algorithm Hash digest
SHA256 dea35792d1d041c88aef76494723bcfd27e08a58f6d535923ac067cf4be023f4
MD5 acf918c55bf1e17882a2eea8bb9fc91b
BLAKE2b-256 1ac6208c9c8a9cec17d1fd40282b7d203b77f4dbcd6ff0ab940ff5c923e412e5

See more details on using hashes here.

File details

Details for the file konsepy-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: konsepy-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 34.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for konsepy-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6ccbf16689b142b2c1a2ccd677cf523f1b2f11fa5560eda3715aedd1702c9333
MD5 45ba724746520aad19bf85c736665abc
BLAKE2b-256 e3f1393c249beb210527c9a1d1db4a4de2fd6e3d471ceeacab840624f1919d28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page