Skip to main content

Various functions to manipulate CONLL files

Project description

CONLL Transform - functions to manipulate CONLL data

This package constains several functions to manipulate conll data:

  • read_files: Read one or several conll files and return a dictionary of documents.
  • read_file: Read a conll file and return dictionary of documents.
  • write_file: Write a conll file.
  • compute_mentions: Compute mentions from the raw last column of the conll file.
  • compute_chains: Compute and return the chains from the conll data.
  • sentpos2textpos: Transform mentions [SENT, START, STOP] to [TEXT_START, TEXT_STOP].
  • textpos2sentpos: Transform mentions [TEXT_START, TEXT_STOP] to [SENT, START, STOP].
  • write_chains: Convert a list of chains to a conll coreference column.
  • replace_coref_col: Replace the last column of tar_docs by the last column of src_docs.
  • remove_singletons: Remove the singletons of the conll file infpath, and write the version without singleton in the conll file outfpath.
  • filter_pos: Filter mentions that have POS in unwanted_pos, return a new mention list.
  • check_no_duplicate_mentions: Return True if there is no duplicate mentions.
  • merge_boundaries: Add the mentions of boundary_docs to coref_docs if they don't already exist, as singletons.
  • remove_col: Remove columns from all tokens in docs.
  • write_mentions: Opposite for compute_mentions(). Write the last column in sent.
  • compare_coref_cols: Build a conll file that merge the corefcols of several other files.
  • to_corefcol: Write the conll file outfpath with just the last column (coref) of the conll file infpath.
  • get_conll_2012_key_pattern: Return a compiled pattern object to match conll2012 key format.
  • merge_amalgams: Add amalgams in documents from where they have been removed.

To use it, just import the function from conll_transform, for example:

from conll_transform import read_files

documents = read_files("myfile.conll", "myfile2.conll")
print(documents)

The source can be found at GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

conll_transform-0.1.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

conll_transform-0.1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file conll_transform-0.1.0.tar.gz.

File metadata

  • Download URL: conll_transform-0.1.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for conll_transform-0.1.0.tar.gz
Algorithm Hash digest
SHA256 26ed32f55f20aef06b39a4af40360ea30948491c6f9daf206e85d8a995e0c395
MD5 118cf0a8c19b1f1f7c21403c484f3c0c
BLAKE2b-256 c51141a298bbce94cc05f607766666ffbaccaec6ba5f8ff4973afb3f191a359e

See more details on using hashes here.

File details

Details for the file conll_transform-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for conll_transform-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c06c97b3cb25673d40b66ed25c91ecb91adbb4aae4c229fecda0bfe22ce5d324
MD5 1c9fea99b60512d8a251e23b2a6d7724
BLAKE2b-256 28670fcd538ffc5622029dc9c42060d9fba2656b2c74d6a10b376e54bee3bae0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page