ReplaCy = spaCy Matcher + pyInflect. Create rules, correct sentences.
Project description
replaCy: match & replace with spaCy
We found that in multiple projects we had duplicate code for using spaCy’s blazing fast matcher to do the same thing: Match-Replace-Grammaticalize. So we wrote replaCy!
- Match - spaCy’s matcher is great, and lets you match on text, shape, POS, dependency parse, and other features. We extended this with “match hooks”, predicates that get used in the callback function to further refine a match.
- Replace - Not built into spaCy’s matcher syntax, but easily added. You often want to replace a matched word with some other term.
- Grammaticalize - If you match on ”LEMMA”: “dance”, and replace with suggestions: ["sing"], but the actual match is danced, you need to conjugate “sing” appropriately. This is the “killer feature” of replaCy
Requirements
spacy >= 2.0
(not installed by default, but replaCy needs to be instantiated with annlp
object)
Installation
pip install replacy
Quick start
from replacy import ReplaceMatcher
from replacy.db import load_json
import spacy
match_dict = load_json('/path/to/your/match/dict.json')
# load nlp spacy model of your choice
nlp = spacy.load("en_core_web_sm")
rmatcher = ReplaceMatcher(nlp, match_dict=match_dict)
# get inflected suggestions
# look up the first suggestion
span = rmatcher("She extracts revenge.")[0]
span._.suggestions
# >>> ['exacts']
Input
ReplaceMatcher accepts both text and spaCy doc.
# text is ok
span = r_matcher("She extracts revenge.")[0]
# doc is ok too
doc = nlp("She extracts revenge.")
span = r_matcher(doc)[0]
match_dict.json format
Here is a minimal match_dict.json
:
{
"extract-revenge": {
"patterns": [
{
"LEMMA": "extract",
"TEMPLATE_ID": 1
}
],
"suggestions": [
[
{
"TEXT": "exact",
"FROM_TEMPLATE_ID": 1
}
]
],
"match_hook": [
{
"name": "succeeded_by_phrase",
"args": "revenge",
"match_if_predicate_is": true
}
],
"test": {
"positive": [
"And at the same time extract revenge on those he so despises?",
"Watch as Tampa Bay extracts revenge against his former Los Angeles Rams team."
],
"negative": ["Mother flavours her custards with lemon extract."]
}
}
}
For more information how to compose match_dict
see our wiki:
Citing
If you use replaCy in your research, please cite with the following BibText
@misc{havens2019replacy,
title = {SpaCy match and replace, maintaining conjugation},
author = {Sam Havens, Aneta Stal, and Manhal Daaboul},
url = {https://github.com/Qordobacode/replaCy},
year = {2019}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
replacy-3.7.2.tar.gz
(27.4 kB
view details)
Built Distribution
replacy-3.7.2-py3-none-any.whl
(30.1 kB
view details)
File details
Details for the file replacy-3.7.2.tar.gz
.
File metadata
- Download URL: replacy-3.7.2.tar.gz
- Upload date:
- Size: 27.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.11 CPython/3.8.13 Linux/5.15.0-1021-gcp
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 218d4314a434fad6f2e275bc83ca6422a281b44ab6ea480ed81c4b33febe2271 |
|
MD5 | 89d1f054042a61653d4645126568f786 |
|
BLAKE2b-256 | b23a652fc5853e6e14d6dc7433ec9643bc327d82c9e277b1297380544a68f72a |
File details
Details for the file replacy-3.7.2-py3-none-any.whl
.
File metadata
- Download URL: replacy-3.7.2-py3-none-any.whl
- Upload date:
- Size: 30.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.11 CPython/3.8.13 Linux/5.15.0-1021-gcp
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | be52d3d4990fcd609428c8ddcbc7a8698d2484dc97edc23d67329167c019e912 |
|
MD5 | 675d58d4a241846fc9f01c6133e3b201 |
|
BLAKE2b-256 | 0e11d1290d3e53f044f38113a0987bb4e7df1b288e0c9c0b911fe84b241fde19 |