Skip to main content

UIMA CAS processing library in Python

Project description Documentation Status PyPI - License PyPI - Python Version PyPI

DKPro cassis (pronunciation: [ka.sis]) provides a pure-Python implementation of the Common Analysis System (CAS) as defined by the UIMA framework. The CAS is a data structure representing an object to be enrichted with annotations (the co-called Subject of Analysis, short SofA).

This library enables the creation and manipulation of CAS objects and their associated type systems as well as loading and saving CAS objects in the CAS XMI XML representation in Python programs. This can ease in particular the integration of Python-based Natural Language Processing (e.g. spacy or NLTK) and Machine Learning librarys (e.g. scikit-learn or Keras) in UIMA-based text analysis workflows.

An example of cassis in action is the spacy recommender for INCEpTION, which wraps the spacy NLP library as a web service which can be used in conjunction with the INCEpTION text annotation platform to automatically generate annotation suggestions.


Currently supported features are:

  • Text SofAs
  • Deserializing/serializing UIMA CAS from/to XMI
  • Deserializing/serializing type systems from/to XML
  • Selecting annotations, selecting covered annotations, adding annotations
  • Type inheritance
  • Multiple SofA support
  • Type system can be changed after loading
  • Reference, array and list features

Some features are still under development, e.g.

  • Feature encoding as XML elements (right now only XML attributes work)
  • Proper type checking
  • XML/XMI schema validation
  • Type unmarshalling from string to the actual type specified in the type system


To install the package with pip, just run

pip install dkpro-cassis


Example CAS XMI and types system files can be found under tests\test_files.

Loading a CAS

A CAS can be deserialized from XMI either by reading from a file or string using load_cas_from_xmi.

from cassis import *

with open('typesystem.xml', 'rb') as f:
    typesystem = load_typesystem(f)

with open('cas.xml', 'rb') as f:
   cas = load_cas_from_xmi(f, typesystem=typesystem)

Adding annotations

Given a type system with a type cassis.Token that has an id and pos feature, annotations can be added in the following:

from cassis import *

with open('typesystem.xml', 'rb') as f:
    typesystem = load_typesystem(f)

with open('cas.xml', 'rb') as f:
    cas = load_cas_from_xmi(f, typesystem=typesystem)

Token = typesystem.get_type('cassis.Token')

tokens = [
    Token(begin=0, end=3, id='0', pos='NNP'),
    Token(begin=4, end=10, id='1', pos='VBD'),
    Token(begin=11, end=14, id='2', pos='IN'),
    Token(begin=15, end=18, id='3', pos='DT'),
    Token(begin=19, end=24, id='4', pos='NN'),
    Token(begin=25, end=26, id='5', pos='.'),

for token in tokens:

Selecting annotations

from cassis import *

with open('typesystem.xml', 'rb') as f:
    typesystem = load_typesystem(f)

with open('cas.xml', 'rb') as f:
    cas = load_cas_from_xmi(f, typesystem=typesystem)

for sentence in'cassis.Sentence'):
    for token in cas.select_covered('cassis.Token', sentence):

        # Annotation values can be accessed as properties
        print('Token: begin={0}, end={1}, id={2}, pos={3}'.format(token.begin, token.end,, token.pos))

Creating types and adding features

from cassis import *

typesystem = TypeSystem()

parent_type = typesystem.create_type(name='example.ParentType')
typesystem.add_feature(type_=parent_type, name='parentFeature', rangeTypeName='String')

child_type = typesystem.create_type(name='example.ChildType',
typesystem.add_feature(type_=child_type, name='childFeature', rangeTypeName='Integer')

annotation = child_type(parentFeature='parent', childFeature='child')

When adding new features, these changes are propagated. For example, adding a feature to a parent type makes it available to a child type. Therefore, the type system does not need to be frozen for consistency. The type system can be changed even after loading, it is not frozen like in UIMAj.

Sofa support

A Sofa represents some form of an unstructured artifact that is processed in a UIMA pipeline. It contains for instance the document text. Currently, new Sofas can be created. This is automatically done when creating a new view. Basic properties of the Sofa can be read and written:

cas = Cas()
cas.sofa_string = "Joe waited for the train . The train was late ."
cas.sofa_mime = "text/plain"


Managing views

A view into a CAS contains a subset of feature structures and annotations. One view corresponds to exactly one Sofa. It can also be used to query and alter information about the Sofa, e.g. the document text. Annotations added to one view are not visible in another view. A view Views can be created and changed. A view has the same methods and attributes as a Cas .

from cassis import *

with open('typesystem.xml', 'rb') as f:
    typesystem = load_typesystem(f)
Token = typesystem.get_type('cassis.Token')

# This creates automatically the view `_InitialView`
cas = Cas()
cas.sofa_string = "I like cheese ."

    Token(begin=0, end=1),
    Token(begin=2, end=6),
    Token(begin=7, end=13),
    Token(begin=14, end=15)

print([x.get_covered_text() for x in cas.select_all()])

# Create a new view and work on it.
view = cas.create_view('testView')
view.sofa_string = "I like blackcurrant ."

    Token(begin=0, end=1),
    Token(begin=2, end=6),
    Token(begin=7, end=19),
    Token(begin=20, end=21)

print([x.get_covered_text() for x in view.select_all()])

Merging type systems

Sometimes, it is desirable to merge two type systems. With cassis, this can be achieved via the merge_typesystems function. The detailed rules of merging can be found here.

from cassis import *

with open('typesystem.xml', 'rb') as f:
    typesystem = load_typesystem(f)

ts = merge_typesystems([typesystem, load_dkpro_core_typesystem()])

DKPro Core Integration

A CAS using the DKPro Core Type System can be created via

from cassis import *

cas = Cas(typesystem=load_dkpro_core_typesystem())

for t in cas.typesystem.get_types():


If your type system defines a type called self or type, then it will be made available as a member variable self_ or type_ on the respective type:

from cassis import *

typesystem = TypeSystem()

ExampleType = typesystem.create_type(name='example.Type')
typesystem.add_feature(type_=ExampleType, name='self', rangeTypeName='String')
typesystem.add_feature(type_=ExampleType, name='type', rangeTypeName='String')

annotation = ExampleType(self_="Test string1", type_="Test string2")



The required dependencies are managed by pip. A virtual environment containing all needed packages for development and production can be created and activated by

virtualenv venv --python=python3 --no-site-packages
source venv/bin/activate
pip install -e ".[test, dev, doc]"

The tests can be run in the current environment by invoking

make test

or in a clean environment via


Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for dkpro-cassis, version 0.2.9
Filename, size File type Python version Upload date Hashes
Filename, size dkpro-cassis-0.2.9.tar.gz (44.2 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page