Various functions to manipulate CONLL files
Project description
CONLL Transform - functions to manipulate CONLL data
This package constains several functions to manipulate conll data:
read_files
: Read one or several conll files and return a dictionary of documents.read_file
: Read a conll file and return dictionary of documents.write_file
: Write a conll file.compute_mentions
: Compute mentions from the raw last column of the conll file.compute_chains
: Compute and return the chains from the conll data.sentpos2textpos
: Transform mentions[SENT, START, STOP]
to[TEXT_START, TEXT_STOP]
.textpos2sentpos
: Transform mentions[TEXT_START, TEXT_STOP]
to[SENT, START, STOP]
.write_chains
: Convert a list of chains to a conll coreference column.replace_coref_col
: Replace the last column oftar_docs
by the last column ofsrc_docs
.remove_singletons
: Remove the singletons of the conll fileinfpath
, and write the version without singleton in the conll fileoutfpath
.filter_pos
: Filter mentions that have POS in unwanted_pos, return a new mention list.check_no_duplicate_mentions
: Return True if there is no duplicate mentions.merge_boundaries
: Add the mentions ofboundary_docs
tocoref_docs
if they don't already exist, as singletons.remove_col
: Remove columns from all tokens in docs.write_mentions
: Opposite forcompute_mentions()
. Write the last column insent
.compare_coref_cols
: Build a conll file that merge the corefcols of several other files.to_corefcol
: Write the conll fileoutfpath
with just the last column (coref) of the conll fileinfpath
.get_conll_2012_key_pattern
: Return a compiled pattern object to match conll2012 key format.merge_amalgams
: Add amalgams in documents from where they have been removed.
To use it, just import the function from conll_transform
, for example:
from conll_transform import read_files
documents = read_files("myfile.conll", "myfile2.conll")
print(documents)
The source can be found at GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
conll_transform-0.1.0.tar.gz
(8.4 kB
view hashes)
Built Distribution
Close
Hashes for conll_transform-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c06c97b3cb25673d40b66ed25c91ecb91adbb4aae4c229fecda0bfe22ce5d324 |
|
MD5 | 1c9fea99b60512d8a251e23b2a6d7724 |
|
BLAKE2b-256 | 28670fcd538ffc5622029dc9c42060d9fba2656b2c74d6a10b376e54bee3bae0 |