Python modules and scripts for working with Concrete
Project description
Copyright 2012-2017 Johns Hopkins University HLTCOE. All rights reserved. This software is released under the 2-clause BSD license. Please see LICENSE for more information.
Concrete-Python
Concrete-Python is the Python interface to Concrete, an HLT data specification defined using Thrift.
Concrete-Python contains generated Python classes and additional utilities. It does not contain the Thrift schema for Concrete, which can be found in the Concrete GitHub repository.
Requirements
Concrete-Python requires Python 2.7 and the Thrift Python library, among other Python libraries. These are installed automatically by setup.py or pip. The Thrift compiler is not required.
Installation
You can install Concrete using the pip package manager:
pip install concrete
or by cloning the repository and running setup.py:
git clone https://github.com/hltcoe/concrete-python.git cd concrete-python python setup.py test python setup.py install
Useful Scripts
The Concrete Python package comes with three scripts:
- concrete_inspect.py
reads in a Concrete Communication and prints out human-readable information about the Communication’s contents (such as tokens, POS and NER tags, Entities, Situations, etc) to stdout. This script is a command-line wrapper around the functionality in the concrete.inspect library.
- concrete2json.py
reads in a Concrete Communication and prints a JSON version of the Communication to stdout. The JSON is “pretty printed” with indentation and whitespace, which makes the JSON easier to read and to use for diffs.
- validate_communication.py
reads in a Concrete Communication file and prints out information about any invalid fields. This script is a command-line wrapper around the functionality in the concrete.validate library.
Use the --help flag for details about the scripts’ command line arguments.
Using the code in your project
Concrete types are located under the ttypes module of their respective namespace in the schema. To import and use Communication, for example:
from concrete.communication.ttypes import Communication foo = Communication() foo.text = 'hello world'
Validating Concrete Communications
The Python version of the Thrift Libraries does not perform any validation of Thrift objects. You should use the validate_communication() function after reading and before writing a Concrete Communication:
from concrete.util import read_communication_from_file from concrete.validate import validate_communication comm = read_communication_from_file('tests/testdata/serif_dog-bites-man.concrete') # Returns True|False, logs details using Python stdlib 'logging' module validate_communication(comm)
Thrift fields have three levels of requiredness:
explicitly labeled as required
explicitly labeled as optional
no requiredness label given (“default required”)
Other Concrete tools will raise an exception if a required field is missing on deserialization or serialization, and will raise an exception if a “default required” field is missing on serialization. By default, Concrete-Python does not perform any validation of Thrift objects on serialization or deserialization. The Python Thrift classes do provide shallow validate() methods, but they only check for explicitly required fields (not “default required” fields) and do not validate nested objects.
The validate_communication() function recursively checks a Communication object for required fields, plus additional checks for UUID mismatches.
Development
Please see CONTRIBUTING.rst for information about contributing to Concrete-Python.
Contributors
Craig Harman
Low Kian Seong
Frank Ferraro
Max Thomas
Adrian Benton
Joel Coffman
Chandler May
Tom Lippincott
Please contact us if you have contributed to Concrete-Python but are not on this list.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.