Skip to main content

ReplaCy = spaCy Matcher + pyInflect. Create rules, correct sentences.

Project description

replaCy: match & replace with spaCy

We found that in multiple projects we had duplicate code for using spaCy’s blazing fast matcher to do the same thing: Match-Replace-Grammaticalize. So we wrote replaCy!

  • Match - spaCy’s matcher is great, and lets you match on text, shape, POS, dependency parse, and other features. We extended this with “match hooks”, predicates that get used in the callback function to further refine a match.
  • Replace - Not built into spaCy’s matcher syntax, but easily added. You often want to replace a matched word with some other term.
  • Grammaticalize - If you match on ”LEMMA”: “dance”, and replace with suggestions: ["sing"], but the actual match is danced, you need to conjugate “sing” appropriately. This is the “killer feature” of replaCy

spaCy pypi Version Code style: black

Requirements

  • spacy >= 2.0 (not installed by default, but replaCy needs to be instantiated with an nlp object)

Installation

pip install replacy

Quick start

from replacy import ReplaceMatcher
from replacy.db import load_json
import spacy


match_dict = load_json('/path/to/your/match/dict.json')
# load nlp spacy model of your choice
nlp = spacy.load("en_core_web_sm")

rmatcher = ReplaceMatcher(nlp, match_dict=match_dict)

# get inflected suggestions
# look up the first suggestion
span = rmatcher("She extracts revenge.")[0]
span._.suggestions
# >>> ['exacts']

Input

ReplaceMatcher accepts both text and spaCy doc.

# text is ok
span = r_matcher("She extracts revenge.")[0]

# doc is ok too
doc = nlp("She extracts revenge.")
span = r_matcher(doc)[0]

match_dict.json format

Here is a minimal match_dict.json:

{
  "extract-revenge": {
    "patterns": [
      {
        "LEMMA": "extract",
        "TEMPLATE_ID": 1
      }
    ],
    "suggestions": [
      [
        {
          "TEXT": "exact",
          "FROM_TEMPLATE_ID": 1
        }
      ]
    ],
    "match_hook": [
      {
        "name": "succeeded_by_phrase",
        "args": "revenge",
        "match_if_predicate_is": true
      }
    ],
    "test": {
      "positive": [
        "And at the same time extract revenge on those he so despises?",
        "Watch as Tampa Bay extracts revenge against his former Los Angeles Rams team."
      ],
      "negative": ["Mother flavours her custards with lemon extract."]
    }
  }
}

For more information how to compose match_dict see our wiki:

Citing

If you use replaCy in your research, please cite with the following BibText

@misc{havens2019replacy,
    title  = {SpaCy match and replace, maintaining conjugation},
    author = {Sam Havens, Aneta Stal, and Manhal Daaboul},
    url    = {https://github.com/Qordobacode/replaCy},
    year   = {2019}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

replaCy-3.3.0.tar.gz (26.8 kB view details)

Uploaded Source

Built Distribution

replaCy-3.3.0-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file replaCy-3.3.0.tar.gz.

File metadata

  • Download URL: replaCy-3.3.0.tar.gz
  • Upload date:
  • Size: 26.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.8.12 Linux/5.11.0-1021-azure

File hashes

Hashes for replaCy-3.3.0.tar.gz
Algorithm Hash digest
SHA256 1f16cf0392f9a61cc85f8ca8b292b6d1342fcb1cdbce26cfbd95078be31c274a
MD5 a7aab0e6fea6bf10cd9d224b403e5e6e
BLAKE2b-256 c35ca3b1ba176f2d276669fc2313aad007142afe6913d18a90f8149cc9d1bd1f

See more details on using hashes here.

File details

Details for the file replaCy-3.3.0-py3-none-any.whl.

File metadata

  • Download URL: replaCy-3.3.0-py3-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.8.12 Linux/5.11.0-1021-azure

File hashes

Hashes for replaCy-3.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 71b4e9612fa1342bd9bc31db06a61ae5e023e362b7f83ea6bd3fa2fbdefaa7a8
MD5 5628045c66d09550a0157c978a849c2a
BLAKE2b-256 6ed4c9fcdac73f61f519cb41ad36be65e31a69657e2b7e801cd0da4f35899012

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page