Skip to main content

Tools for dkpro cassis

Project description

Dkpro cassis tools

Toolkit for managing uima cas xmi files.

Install

pip install dkpro-cassis-tools

Load cas from a zip file

from dkpro_cassis_tools import load_cas_from_zip_file
with open('cas.zip', 'rb') as f:
    cas = load_cas_from_zip_file(f)

Save cas to a zip file

from dkpro_cassis_tools import dump_cas_to_zip_file


with open('cas.zip', 'rb') as f:
    dump_cas_to_zip_file(cas, f)

Restore cas segmentation by newline

from dkpro_cassis_tools import load_cas_from_zip_file
from dkpro_cassis_tools import restore_cas_segmentation_by_newline
from dkpro_cassis_tools import dump_cas_to_zip_file


# Open the cas
with open('cas.zip', 'rb') as f:
    cas = load_cas_from_zip_file(f)

# Restore segmentation  
re_segmented_cas = restore_cas_segmentation_by_newline(cas)

# Save it
with open('re_segmented_cas.zip', 'rb') as f:
    dump_cas_to_zip_file(cas, f)    

Combine sentences from one or more cas

from dkpro_cassis_tools import load_cas_from_zip_file
from dkpro_cassis_tools import dump_cas_to_zip_file
from dkpro_cassis_tools import create_cas_from_sentences
from dkpro_cassis_tools import SENTENCE_NS


sentences = []

# Extract some sentences from cas1 
with open('cas1.zip', 'rb') as f:
    cas1 = load_cas_from_zip_file(f)
for sentence in cas1.select(SENTENCE_NS):
    if len(sentence.get_covered_text())>10:
        sentences.append((cas1, sentence))

# Extract some sentences from cas2 
with open('cas2.zip', 'rb') as f:
    cas2 = load_cas_from_zip_file(f)
for sentence in cas2.select(SENTENCE_NS):
    if len(sentence.get_covered_text())>10:
        sentences.append((cas2, sentence))

# Create the new cas
new_cas = create_cas_from_sentences(sentences) 

# Save it
with open('new_cas.zip', 'rb') as f:
    dump_cas_to_zip_file(new_cas, f)

Tokenize cas

from dkpro_cassis_tools import load_cas_from_zip_file
from dkpro_cassis_tools import tokenize_cas


wakati = MeCab.Tagger("-Owakati")

def tokenize(text: str) -> List[str]:
    return wakati.parse(text).split()

with open('data/cas_tokenize.zip', 'rb') as f:
    cas = load_cas_from_zip_file(f)
    mecab_tokenized_cas = tokenize_cas(cas, tokenize)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dkpro_cassis_tools-0.0.6.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

dkpro_cassis_tools-0.0.6-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file dkpro_cassis_tools-0.0.6.tar.gz.

File metadata

  • Download URL: dkpro_cassis_tools-0.0.6.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.8

File hashes

Hashes for dkpro_cassis_tools-0.0.6.tar.gz
Algorithm Hash digest
SHA256 5719bed13e723e475a4e3efb6b4d115b3ef941a276408c9d0d567998fcab89a8
MD5 af80f7a9d6d7a10cab8df7b25511bd50
BLAKE2b-256 ada4eabfbf524722553a324ef652bb84b4c4fc1230e8799f9447e74ce64097ab

See more details on using hashes here.

File details

Details for the file dkpro_cassis_tools-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: dkpro_cassis_tools-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.8

File hashes

Hashes for dkpro_cassis_tools-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f94326c310d0405a75e9bcf3949998f7f2fafcaee552aeb6d2d6c655b938c6c6
MD5 c49a383b368f7b9e2cbe2aa32995ee53
BLAKE2b-256 30c1153ebe66ecae7cb77301111987ef891d460fbf736d3efd935f73dd9df1ab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page