light-weight python script to transform documents by replacing all mentions of co-referent clusters with first non-pronominal mention. Importantly, it explicitly handles nested coreferent mentions.
Project description
nested_coref_resolver
nested_coref_resolver
is a light-weight python script to transform documents by replacing all mentions of co-referent clusters
with first non-pronominal mention. Importantly, it explicitly handles nested coreferent mentions, which is a very common phenonema and for which no open source solutions currently exist.
Setup
pip install nested_coref_resolver
The only dependency is numpy, for which any recent version should work. It is used sparingly.
Example Usage
The script was built to post-process the outputs of AllenNLP's coreference resolution system. It is agnostic, however, to the coreference predictor used.
Example input to the main replace_corefs.resolve
function is a document
- a list of tokens, and clusters
- a list of lists where each outer list represents a cluster of coreferent mentions and each nested list represents a list of the spans of those mentions.
from ncr.replace_corefs import resolve
example = {
'document': [
'Paul', 'Allen', 'was', 'born', 'on', 'January', '21', ',', '1953', ',', 'in', 'Seattle', ',', 'Washington',
',', 'to', 'Kenneth', 'Sam', 'Allen', 'and', 'Edna', 'Faye', 'Allen', '.', 'Allen', 'attended', 'Lakeside',
'School', ',', 'a', 'private', 'school', 'in', 'Seattle', ',', 'where', 'he', 'befriended', 'Bill',
'Gates', ',', 'two', 'years', 'younger', ',', 'with', 'whom', 'he', 'shared', 'an', 'enthusiasm', 'for',
'computers', '.', 'Paul', 'and', 'Bill', 'used', 'a', 'teletype', 'terminal', 'at', 'their', 'high',
'school', ',', 'Lakeside', ',', 'to', 'develop', 'their', 'programming', 'skills', 'on', 'several',
'time', '-', 'sharing', 'computer', 'systems', '.'
],
'clusters': [
[[0, 1], [24, 24], [36, 36], [47, 47], [54, 54]],
[[11, 14], [33, 33]],
[[38, 52], [56, 56]],
[[54, 56], [62, 62], [70, 70]],
[[26, 34], [62, 67]]
]
}
resolved_toks = resolve(example['document'], example['clusters'])
print(' '.join(resolved_toks))
This will produce the following output:
Paul Allen was born on January 21 , 1953 , in Seattle , Washington , to Kenneth Sam Allen and Edna Faye Allen .
Paul Allen attended Lakeside School , a private school in Seattle , Washington , , where Paul Allen befriended
Bill Gates , two years younger , with whom Paul Allen shared an enthusiasm for computers . Paul Allen and Bill Gates
, two years younger , with whom Paul Allen shared an enthusiasm for computers used a teletype terminal at
Lakeside School , a private school in Seattle , Washington , , to develop Paul Allen and Bill Gates ,
two years younger , with whom Paul Allen shared an enthusiasm for computers programming skills on several time -
sharing computer systems .
Functionality
Handles
- nested coreferent entities - very common and not handled by any pre-existing coreferent replacement scripts.
- Selects head mention as first non-pronominal reference. In practice, this is very effective.
Does NOT Handle
- Syntax conflicts when replacing text. I.e. if a possessive noun is the head mention, it will be resolved indiscriminately regardless of contexts in which it is placed. Please see AllenNLP's function for guidance. (This will be incorporated in later releases.)
- More sophisticated head reference selection. Custom logic for head reference selection can be incorporated into the script to replace the naive current approach of first non-pronominal selection.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nested_coref_resolver-0.0.2.tar.gz
.
File metadata
- Download URL: nested_coref_resolver-0.0.2.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb692025128aaac695b27122a9e90669ce8333f32e7c7809ffff093c7e4b2c08 |
|
MD5 | 8e1995baee36e97774c03dd186568fc3 |
|
BLAKE2b-256 | 127b05f0199db899a1ff89289e53014d70d66353c5f80ba566875f7ea05c498d |
File details
Details for the file nested_coref_resolver-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: nested_coref_resolver-0.0.2-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7f780eddee27b4ae18d1bafe5e0a76d0421eef520e3a3030aca7217a0a20ea1 |
|
MD5 | 28100cd1ba45746e6b54836aab7ebb31 |
|
BLAKE2b-256 | 153ae42934525cc5fa48541319c184d08238375d48927568b2aa2da9fe340095 |