Tools for dkpro cassis
Project description
Dkpro cassis tools
Toolkit for managing uima cas xmi files.
Install
pip install dkpro-cassis-tools
Load cas from a zip file
from dkpro_cassis_tools import load_cas_from_zip_file
with open('cas.zip', 'rb') as f:
cas = load_cas_from_zip_file(f)
Save cas to a zip file
from dkpro_cassis_tools import dump_cas_to_zip_file
with open('cas.zip', 'rb') as f:
dump_cas_to_zip_file(cas, f)
Restore cas segmentation by newline
from dkpro_cassis_tools import load_cas_from_zip_file
from dkpro_cassis_tools import restore_cas_segmentation_by_newline
from dkpro_cassis_tools import dump_cas_to_zip_file
# Open the cas
with open('cas.zip', 'rb') as f:
cas = load_cas_from_zip_file(f)
# Restore segmentation
re_segmented_cas = restore_cas_segmentation_by_newline(cas)
# Save it
with open('re_segmented_cas.zip', 'rb') as f:
dump_cas_to_zip_file(cas, f)
Combine sentences from one or more cas
from dkpro_cassis_tools import load_cas_from_zip_file
from dkpro_cassis_tools import dump_cas_to_zip_file
from dkpro_cassis_tools import create_cas_from_sentences
from dkpro_cassis_tools import SENTENCE_NS
sentences = []
# Extract some sentences from cas1
with open('cas1.zip', 'rb') as f:
cas1 = load_cas_from_zip_file(f)
for sentence in cas1.select(SENTENCE_NS):
if len(sentence.get_covered_text())>10:
sentences.append((cas1, sentence))
# Extract some sentences from cas2
with open('cas2.zip', 'rb') as f:
cas2 = load_cas_from_zip_file(f)
for sentence in cas2.select(SENTENCE_NS):
if len(sentence.get_covered_text())>10:
sentences.append((cas2, sentence))
# Create the new cas
new_cas = create_cas_from_sentences(sentences)
# Save it
with open('new_cas.zip', 'rb') as f:
dump_cas_to_zip_file(new_cas, f)
Tokenize cas
from dkpro_cassis_tools import load_cas_from_zip_file
from dkpro_cassis_tools import tokenize_cas
wakati = MeCab.Tagger("-Owakati")
def tokenize(text: str) -> List[str]:
return wakati.parse(text).split()
with open('data/cas_tokenize.zip', 'rb') as f:
cas = load_cas_from_zip_file(f)
mecab_tokenized_cas = tokenize_cas(cas, tokenize)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dkpro_cassis_tools-0.0.6.tar.gz
.
File metadata
- Download URL: dkpro_cassis_tools-0.0.6.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5719bed13e723e475a4e3efb6b4d115b3ef941a276408c9d0d567998fcab89a8 |
|
MD5 | af80f7a9d6d7a10cab8df7b25511bd50 |
|
BLAKE2b-256 | ada4eabfbf524722553a324ef652bb84b4c4fc1230e8799f9447e74ce64097ab |
File details
Details for the file dkpro_cassis_tools-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: dkpro_cassis_tools-0.0.6-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.7.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f94326c310d0405a75e9bcf3949998f7f2fafcaee552aeb6d2d6c655b938c6c6 |
|
MD5 | c49a383b368f7b9e2cbe2aa32995ee53 |
|
BLAKE2b-256 | 30c1153ebe66ecae7cb77301111987ef891d460fbf736d3efd935f73dd9df1ab |