Script to convert files in NAF format to CoNLL format
Project description
naf2conll
Script to convert coreference data in NAF format to CoNLL format.
!! NB !! At the moment, this script only supports the following columns:
- 1: Document ID
- 3: Word number
- 4: Word itself
- 12: Coreference
The following CoNLL columns are supported by NAF, but are not (yet) processed (correctly) by this script:
- 5: POS tag
- 6: constituency tree
- ...?
- 11: named entities
See CoNLL-specification.md for an extensive description of the CoNLL format.
Usage
naf2conll.py
To automatically find all (sub)folders that contain NAF files and convert all data in those folders, run:
naf2conll.py path/to/output_dir -d path/to/some/folder [-d path/to/another/folder ...]
To only convert one file, run:
naf2conll.py path/to/output.conll path/to/input.naf
Columns of CoNLL output
By default only Column 1, 3, 4 and 12 are output.
If you choose to output more columns, the following values and place-holders are used.
Column | Description | Value | Conform CoNLL specification? |
---|---|---|---|
1 | Document ID | file path without extension | Yes |
2 | Part number | 0 |
Yes |
3 | Word number | generated | Yes |
4 | Word itself | extracted from text layer of NAF | Yes |
5 | POS | [POS] |
No |
6 | Parse bit | * |
No |
7 | Predicate lemma | - |
Yes |
8 | Predicate Frameset ID | - |
Yes |
9 | Word sense | - |
Yes |
10 | Speaker/Author | UNKNOWN |
??? |
11 | Named Entities | * |
Yes |
- | Predicate Arguments | None: column(s) left out entirely | Yes, conform example in CoNLL 2012 |
12 | Coreference | extracted from coreference layer of NAF (ISSUE! [1]) | Yes |
[1]:
The reference spans are not closed in the correct order if they end at the same word. The following is an example of output from naf2conll.py
:
(10
-
(52|(55
52)
-
10)|55)|(133)
While pedantically correct would be:
(10
-
(55|(52
52)
-
(133)|55)|10)
Issues
- 'on_missing' config key is not validated before use
- Raise an error when there is no coref layer in
extract_coref_sets
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file naf2conll-1.0.1.zip
.
File metadata
- Download URL: naf2conll-1.0.1.zip
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | caeb3b9474f49ee2cf61e6955a36a3c44784b96ef282d0d6907887d2dc00dc63 |
|
MD5 | e053ee57e155d46db2fadded6df354da |
|
BLAKE2b-256 | 526e3641da068c13edf031b9735d5af145ec89cd9e089cd97c96138b75d27936 |