Skip to main content

light-weight python script to transform documents by replacing all mentions of co-referent clusters with first non-pronominal mention. Importantly, it explicitly handles nested coreferent mentions.

Project description

nested_coref_resolver

nested_coref_resolver is a light-weight python script to transform documents by replacing all mentions of co-referent clusters with first non-pronominal mention. Importantly, it explicitly handles nested coreferent mentions, which is a very common phenonema and for which no open source solutions currently exist.

Setup

pip install nested_coref_resolver

The only dependency is numpy, for which any recent version should work. It is used sparingly.

Example Usage

The script was built to post-process the outputs of AllenNLP's coreference resolution system. It is agnostic, however, to the coreference predictor used. Example input to the main replace_corefs.resolve function is a document - a list of tokens, and clusters - a list of lists where each outer list represents a cluster of coreferent mentions and each nested list represents a list of the spans of those mentions.

from ncr.replace_corefs import resolve
example = {
        'document': [
            'Paul', 'Allen', 'was', 'born', 'on', 'January', '21', ',', '1953', ',', 'in', 'Seattle', ',', 'Washington',
            ',', 'to', 'Kenneth', 'Sam', 'Allen', 'and', 'Edna', 'Faye', 'Allen', '.', 'Allen', 'attended', 'Lakeside',
            'School', ',', 'a', 'private', 'school', 'in', 'Seattle', ',', 'where', 'he', 'befriended', 'Bill',
            'Gates', ',', 'two', 'years', 'younger', ',', 'with', 'whom', 'he', 'shared', 'an', 'enthusiasm', 'for',
            'computers', '.', 'Paul', 'and', 'Bill', 'used', 'a', 'teletype', 'terminal', 'at', 'their', 'high',
            'school', ',', 'Lakeside', ',', 'to', 'develop', 'their', 'programming', 'skills', 'on', 'several',
            'time', '-', 'sharing', 'computer', 'systems', '.'
        ],
        'clusters': [
            [[0, 1], [24, 24], [36, 36], [47, 47], [54, 54]],
            [[11, 14], [33, 33]],
            [[38, 52], [56, 56]],
            [[54, 56], [62, 62], [70, 70]],
            [[26, 34], [62, 67]]
        ]
}

resolved_toks = resolve(example['document'], example['clusters'])
print(' '.join(resolved_toks))

This will produce the following output:

Paul Allen was born on January 21 , 1953 , in Seattle , Washington , to Kenneth Sam Allen and Edna Faye Allen .
Paul Allen attended Lakeside School , a private school in Seattle , Washington , , where Paul Allen befriended
Bill Gates , two years younger , with whom Paul Allen shared an enthusiasm for computers . Paul Allen and Bill Gates
, two years younger , with whom Paul Allen shared an enthusiasm for computers used a teletype terminal at 
Lakeside School , a private school in Seattle , Washington , , to develop Paul Allen and Bill Gates ,
two years younger , with whom Paul Allen shared an enthusiasm for computers programming skills on several time -
sharing computer systems .

Functionality

Handles

  1. nested coreferent entities - very common and not handled by any pre-existing coreferent replacement scripts.
  2. Selects head mention as first non-pronominal reference. In practice, this is very effective.

Does NOT Handle

  1. Syntax conflicts when replacing text. I.e. if a possessive noun is the head mention, it will be resolved indiscriminately regardless of contexts in which it is placed. Please see AllenNLP's function for guidance. (This will be incorporated in later releases.)
  2. More sophisticated head reference selection. Custom logic for head reference selection can be incorporated into the script to replace the naive current approach of first non-pronominal selection.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nested_coref_resolver-0.0.2.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

nested_coref_resolver-0.0.2-py3-none-any.whl (8.2 kB view details)

Uploaded Python 3

File details

Details for the file nested_coref_resolver-0.0.2.tar.gz.

File metadata

  • Download URL: nested_coref_resolver-0.0.2.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.10

File hashes

Hashes for nested_coref_resolver-0.0.2.tar.gz
Algorithm Hash digest
SHA256 cb692025128aaac695b27122a9e90669ce8333f32e7c7809ffff093c7e4b2c08
MD5 8e1995baee36e97774c03dd186568fc3
BLAKE2b-256 127b05f0199db899a1ff89289e53014d70d66353c5f80ba566875f7ea05c498d

See more details on using hashes here.

File details

Details for the file nested_coref_resolver-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: nested_coref_resolver-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 8.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.6.10

File hashes

Hashes for nested_coref_resolver-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a7f780eddee27b4ae18d1bafe5e0a76d0421eef520e3a3030aca7217a0a20ea1
MD5 28100cd1ba45746e6b54836aab7ebb31
BLAKE2b-256 153ae42934525cc5fa48541319c184d08238375d48927568b2aa2da9fe340095

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page