Skip to main content

Reverse engineer patterns for use with the SpaCy DependencyTreeMatcher

Project description

SpaCy Pattern Builder

Use training examples to build and refine patterns for use with SpaCy's DependencyMatcher.

Motivation

Generating patterns programmatically from training data is more efficient than creating them manually.

Installation

With pip:

pip install spacy-pattern-builder

Usage

# Import a SpaCy model, parse a string to create a Doc object
import en_core_web_sm

text = 'We introduce efficient methods for fitting Boolean models to molecular data.'
nlp = en_core_web_sm.load()
doc = nlp(text)

from spacy_pattern_builder import build_dependency_pattern

# Provide a list of tokens we want to match.
match_tokens = [doc[i] for i in [0, 1, 3]]  # [We, introduce, methods]

''' Note that these tokens must be fully connected. That is,
all tokens must have a path to all other tokens in the list,
without needing to traverse tokens outside of the list.
Otherwise, spacy-pattern-builder will raise a TokensNotFullyConnectedError.
You can get a connected set that includes your tokens with the following: '''
from spacy_pattern_builder import util
connected_tokens = util.smallest_connected_subgraph(match_tokens, doc)
assert match_tokens == connected_tokens  # In this case, the tokens we provided are already fully connected

# Specify the token attributes / features to use
feature_dict = {  # This is equal to the default feature_dict
    'DEP': 'dep_',
    'TAG': 'tag_'
}

# Build the pattern
pattern = build_dependency_pattern(doc, match_tokens, feature_dict=feature_dict)

from pprint import pprint
pprint(pattern)  # In the format consumed by SpaCy's DependencyMatcher:
'''
[{'PATTERN': {'DEP': 'ROOT', 'TAG': 'VBP'}, 'SPEC': {'NODE_NAME': 'node1'}},
 {'PATTERN': {'DEP': 'nsubj', 'TAG': 'PRP'},
  'SPEC': {'NBOR_NAME': 'node1', 'NBOR_RELOP': '>', 'NODE_NAME': 'node0'}},
 {'PATTERN': {'DEP': 'dobj', 'TAG': 'NNS'},
  'SPEC': {'NBOR_NAME': 'node1', 'NBOR_RELOP': '>', 'NODE_NAME': 'node3'}}]
'''

# Create a matcher and add the newly generated pattern
from spacy.matcher import DependencyMatcher

matcher = DependencyTreeMatcher(doc.vocab)
matcher.add('pattern', None, pattern)

# And get matches
matches = matcher(doc)
for match_id, token_idxs in matches:
    tokens = [doc[i] for i in token_idxs]
    tokens = sorted(tokens, key=lambda w: w.i)  # Make sure tokens are in their original order
    print(tokens)  # [We, introduce, methods]

Acknowledgements

Uses:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-pattern-builder-0.0.7.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spacy_pattern_builder-0.0.7-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file spacy-pattern-builder-0.0.7.tar.gz.

File metadata

  • Download URL: spacy-pattern-builder-0.0.7.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.8

File hashes

Hashes for spacy-pattern-builder-0.0.7.tar.gz
Algorithm Hash digest
SHA256 9be924640bd4d72d09da2647f39c82a73db535724a8bd609f633835a3efcb2a6
MD5 3028383bee550f5fac1a1067a3159f97
BLAKE2b-256 ab8b4a96ae0f1e936a600a4083bf7725c1380bd79e97ddf4ab9a64fb566b5b74

See more details on using hashes here.

File details

Details for the file spacy_pattern_builder-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: spacy_pattern_builder-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.8

File hashes

Hashes for spacy_pattern_builder-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 6de306a7fde4ec40753c7943eb0aff4cd870eddd3892224b9e756ea6a20c329b
MD5 aa1423641e40f6c1fb3e161399113a54
BLAKE2b-256 5fbe04836f7353cd6b610eb437cba90572dc32acf63a851c1c93901195b2118e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page