Skip to main content

Neural coreference resolution

Project description

DeCOFre

Build Status PyPI Code style: black

Detecting Coreferences for Oral French¹.

This was developed for application on spoken French as part of my PhD thesis, it is relatively easy to apply it to other languages and genres, though.

Installation

  1. Install with pip

    python -m pip install decofre
    
  2. Install the additional dependencies

    python -m spacy download fr_core_news_lg
    

Running a pretrained model

Use decofre-infer, e.g.

decofre-infer path/to/detector.model path/to/coref.model path/to/raw_text.txt

Its output is still rather crude and mostly meant for demonstration purpose.

Training a model

Downloading ANCOR

So far the only corpus we officially support (more in preparation, along with an easier bootstrap procedure).

  • Clone this repo git clone https://github.com/LoicGrobol/decofre && cd decofre
  • Ensure you are in an environment where DeCOFre has been installed (to be sure that all the dependencies are correct)
  • Run the bootstrap script python -m doit run -f datasets/ancor/ancor.py

Actual training

Use decofre-train, e.g.

decofre-train --config tests/sanity-check.jsonnet --model-config decofre/models/default.jsonnet --out-dir /path/to/an/output/directory

This will put a detector.model and a coref.model files in the selected output directory, that you can then load in decofre-infer.

The sanity-check trainig config is, well, a sanity check, meant to see if DeCOFre actually runs in your environment and uses a tiny training set to make it fast. The resulting models will therefore be awful. This is normal, don't be alarmed.

You probably want to substitute the config files for your own, see also ANCOR config files in datasets/ancor/. The config files are not really documented right now, but you can take inspiration from the provided examples. See also decofre-train --help for other options.

This is by no mean fast, you have been warned.

Citation

@inproceedings{grobol2019NeuralCoreferenceResolution,
  author = {Grobol, Loïc},
  date = {2019-06},
  eventtitle = {Proceedings of the {{Second Workshop}} on {{Computational Models}} of {{Reference}}, {{Anaphora}} and {{Coreference}}},
  pages = {8-14},
  title = {Neural {{Coreference Resolution}} with {{Limited Lexical Context}} and {{Explicit Mention Detection}} for {{Oral French}}},
  url = {https://www.aclweb.org/anthology/papers/W/W19/W19-2802/},
  urldate = {2019-06-24}
}

1. Let me know if you think of a better name.

Licence

Unless otherwise specified (see below), the following licence (the so-called “MIT License”) applies to all the files in this repository. See also LICENCE.md.

Copyright 2020 Loïc Grobol <loic.grobol@gmail.com>

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
associated documentation files (the "Software"), to deal in the Software without restriction,
including without limitation the rights to use, copy, modify, merge, publish, distribute,
sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or
substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT
NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT
OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Licence exceptions

The files listed here are distributed under different terms. When redistributing or building upon this work, you have to comply with their respective restrictions separately.

ANCOR

CC-BY-NC-SA-4.0 badge

The following files are derived from the ANCOR Corpus and distributed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decofre-0.7.0.tar.gz (66.3 kB view details)

Uploaded Source

Built Distribution

decofre-0.7.0-py3-none-any.whl (79.9 kB view details)

Uploaded Python 3

File details

Details for the file decofre-0.7.0.tar.gz.

File metadata

  • Download URL: decofre-0.7.0.tar.gz
  • Upload date:
  • Size: 66.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for decofre-0.7.0.tar.gz
Algorithm Hash digest
SHA256 b09b1e0be04b7baf4f1d6433d834b0574a14bbe926f62ef9f85b67b5b7d98033
MD5 2508af649368fbda193600353fd932cb
BLAKE2b-256 d8048decd47375e02dc0d29c26d300852797fdcab7f4d699a59ca0873080d5ec

See more details on using hashes here.

File details

Details for the file decofre-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: decofre-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 79.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for decofre-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 86bf4ef9692799822e6c82f43442a9ef825a438706376f2abfd499f271fa9af7
MD5 af3e9696505af1b4cbe4744895d1b412
BLAKE2b-256 28d8c0d1dd538bf090e3e18397ab3154cd10691a4dd1e4515640adcf3745e789

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page