Skip to main content

Reverse engineer patterns for use with the SpaCy DependencyTreeMatcher

Project description

SpaCy Pattern Builder

Use training examples to build and refine patterns for use with SpaCy's DependencyTreeMatcher.

Motivation

Generating patterns programmatically from training data is more efficient than creating them manually.

Installation

With pip:

pip install spacy-pattern-builder

Usage

# Import a SpaCy model, parse a string to create a Doc object
import en_core_web_sm

text = 'We introduce efficient methods for fitting Boolean models to molecular data.'
nlp = en_core_web_sm.load()
doc = nlp(text)

from spacy_pattern_builder import build_dependency_pattern

# Provide a list of tokens we want to match.
match_tokens = [doc[i] for i in [0, 1, 3]]  # [We, introduce, methods]

''' Note that these tokens must be fully connected. That is,
all tokens must have a path to all other tokens in the list,
without needing to traverse tokens outside of the list.
Otherwise, spacy-pattern-builder will raise a TokensNotFullyConnectedError.
You can get a connected set that includes your tokens with the following: '''
from spacy_pattern_builder import util
connected_tokens = util.smallest_connected_subgraph(match_tokens, doc)
assert match_tokens == connected_tokens

# Specify the token attributes / features to use
feature_dict = {  # This here is equal to the default feature_dict
    'DEP': 'dep_',
    'TAG': 'tag_'
}

# Build the pattern
pattern = build_dependency_pattern(doc, match_tokens, feature_dict=feature_dict)

from pprint import pprint
pprint(pattern)  # In the format consumed by SpaCy's DependencyTreeMatcher:
'''
[{'PATTERN': {'DEP': 'ROOT', 'TAG': 'VBP'}, 'SPEC': {'NODE_NAME': 'node1'}},
 {'PATTERN': {'DEP': 'nsubj', 'TAG': 'PRP'},
  'SPEC': {'NBOR_NAME': 'node1', 'NBOR_RELOP': '>', 'NODE_NAME': 'node0'}},
 {'PATTERN': {'DEP': 'dobj', 'TAG': 'NNS'},
  'SPEC': {'NBOR_NAME': 'node1', 'NBOR_RELOP': '>', 'NODE_NAME': 'node3'}}]
'''

# Create a matcher and add the newly generated pattern
from spacy.matcher import DependencyTreeMatcher

matcher = DependencyTreeMatcher(doc.vocab)
matcher.add('pattern', None, pattern)

# And match away
matches = matcher(doc)
for match_id, token_idxs in matches:
    tokens = [doc[i] for i in token_idxs]
    tokens = sorted(tokens, key=lambda w: w.i)
    print(tokens)  # [We, introduce, methods]

Acknowledgements

Uses:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-pattern-builder-0.0.2.tar.gz (7.6 kB view hashes)

Uploaded Source

Built Distribution

spacy_pattern_builder-0.0.2-py2.py3-none-any.whl (8.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page