Building blocks for spacy Matcher patterns
Project description
corpus-patterns
Create a custom tokenizer
from corpus_patterns import set_tokenizer
nlp = spacy.blank("en")
nlp.tokenizer = set_tokenizer(nlp)
The tokenizer:
- Removes dashes from infixes
- Adds prefix/suffix rules for parenthesis/brackets
- Adds special exceptions to treat dotted text as a single token
Add .jsonl files to directory
Each file will contain lines of spacy matcher patterns.
from corpus_patterns import create_rules
from pathlib import Path
create_rules(folder=Path(Path("location-here"))) # check directory
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
corpus_patterns-0.0.2.tar.gz
(14.4 kB
view hashes)
Built Distribution
Close
Hashes for corpus_patterns-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82f52eb51c52150a3525587ff1b1ecd3898cb69bcaad68b57b2e631d8245abc7 |
|
MD5 | 9ebfc8dec4fed4f85e9b0d17136dea09 |
|
BLAKE2b-256 | 86d57bb695718939bf5c8c9b428b74c9993d37b8344f4d61b62e401ad6af5b32 |