Framework for build NLP information extraction systems using regular expressions.
Project description
konsepy
Framework for build NLP information extraction systems using regular expressions. konsepy then enables leveraging the NLP system to create a silver standard for fine-tuning a transformer model.
Installation
konsepyis designed to be used with theknosepy_nlp_template- See the README there for current installation instructions.
- To use
konsepyas a standalone entity:- Install with
pip:pip install konsepy[all]- For sentence-splitting corpora from fine-tuning a sentence based transformer,
spacywill also need to be installed and configured.
- Install with
Usage
The package provides a centralized CLI tool konsepy.
Building your NLP Package
To use konsepy, you need to create an NLP package (e.g., my_nlp_package) with the following structure. The best way to get this format is to clone the konsepy_nlp_template:
my_nlp_package/
├── __init__.py
└── concepts/
├── __init__.py
└── my_concept.py
Each concept file (e.g., my_concept.py) must define:
REGEXES: A list of regex-category pairs (and optional context functions).RUN_REGEXES_FUNC: A function that executes the regexes and returns categories/matches.CategoryEnum: AnEnumdefining the possible categories for the concept.
Search Functions
konsepy provides several pre-built search functions in konsepy.rxsearch:
Some simple ones:
search_all_regex: Finds all occurrences of each regex in the list.search_first_regex: Finds only the first occurrence of each regex.
Probably the most useful:
search_and_replace_regex_func: Prevents double-matching by replacing found text with dots before proceeding to the next regex.search_all_regex_func: Supports "sentinel" values (None) to stop processing if a match was found earlier.
Example of my_concept.py:
import re
from enum import Enum
from konsepy.rxsearch import search_all_regex_func
from konsepy.context.negation import check_if_negated
from konsepy.context.other_subject import check_if_other_subject
class CategoryEnum(Enum):
MENTION = 1
NO = 0
OTHER = 3
REGEXES = [
(re.compile(r'my pattern', re.I),
CategoryEnum.MENTION,
[
lambda **kwargs: check_if_negated(neg_concept=CategoryEnum.NO, **kwargs),
lambda **kwargs: check_if_other_subject(other_concept=CategoryEnum.OTHER, **kwargs),
]
),
]
# word_window specifies the number of words to retrieve for context functions (instead of character):
RUN_REGEXES_FUNC = search_all_regex_func(REGEXES, word_window=5)
# to alter the character-based window:
RUN_REGEXES_FUNC = search_all_regex_func(REGEXES, window=50) # defaults to 30
Custom Search Functions
You can create your own search function by defining a function that returns a generator:
def my_custom_search(regexes):
def _search(text, include_match=False):
for regex, category, *other in regexes:
for m in regex.finditer(text):
yield (category, m) if include_match else category
return _search
Running konsepy
# Run all concepts in a package against input files
konsepy run-all --package-name my_nlp_package --input-files data.csv --outdir output/
# Extract snippets for manual review
konsepy run4snippets --package-name my_nlp_package --input-files data.csv --outdir snippets/
# Generate BIO tagged data for model training
konsepy bio-tag --package-name my_nlp_package --input-files data.csv --outdir bio_data/
For more detailed documentation and a template, see konsepy_nlp_template.
Roadmap
- Change labels to some metadata object to allow more diverse input sources and run info
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file konsepy-0.3.2.tar.gz.
File metadata
- Download URL: konsepy-0.3.2.tar.gz
- Upload date:
- Size: 30.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dea35792d1d041c88aef76494723bcfd27e08a58f6d535923ac067cf4be023f4
|
|
| MD5 |
acf918c55bf1e17882a2eea8bb9fc91b
|
|
| BLAKE2b-256 |
1ac6208c9c8a9cec17d1fd40282b7d203b77f4dbcd6ff0ab940ff5c923e412e5
|
File details
Details for the file konsepy-0.3.2-py3-none-any.whl.
File metadata
- Download URL: konsepy-0.3.2-py3-none-any.whl
- Upload date:
- Size: 34.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6ccbf16689b142b2c1a2ccd677cf523f1b2f11fa5560eda3715aedd1702c9333
|
|
| MD5 |
45ba724746520aad19bf85c736665abc
|
|
| BLAKE2b-256 |
e3f1393c249beb210527c9a1d1db4a4de2fd6e3d471ceeacab840624f1919d28
|