Skip to main content

Python package that powers the TextStudio development environment where users can explore, process, model, and visualize textual data.

Project description

TextStudio

TextStudio is a text processing architecture comprised of the text-studio Python package, software development kit (SDK), and desktop application. Each of these components contributes to creating the development environment where users can explore, process, model, and visualize textual data.

The following documentation pertais specifically to text-studio, the Python package which supports the software development kit (SDK).

Installation

pip install text-studio

OR

git clone https://github.com/tevnpowers/text-studio

Data Loader

A Data Loader is responsible for loading data that exists outside of the application into a canonical TextStudio data set. It must also provide the inverse functionality, to write a TextStudio data set to an external location.

A DataLoader plugin is a subclass of the text_studio.DataLoader abstract class, and implements the following class methods:

  • load: Create the list of data instances for a text_studio.Datasetfrom data that exists outside of the application.
    • Parameters:
      • file_path: The path to a data file or directory which contains data files to be loaded into a data set.
      • **kwargs: Additional keyword arguments that the author can optionally require in order to configure the load logic.
    • Return:
      • List of dictionary objects, each of which represents a single data instance in the data set.
  • save: Export the list of data instances in a text_studio.Dataset to a storage system outside of the application.
    • Parameters:
      • file_path: The path to a data file or directory where the data set should be exported.
      • **kwargs: Additional keyword arguments that the author can optionally require in order to configure the save logic.
    • Return:
      • Boolean value that is True if a data set is successfully exported and False if the save failed for any reason.

Pipeline

A text processing pipeline is any combination of Annotator or Action components that run in a sequence on an input data set. Pipelines themselves are implemented by text_studio.Pipeline. In general, Pipelines will be instantiated by the TextStudio desktop application, not by developers.

However, developers may write plugins for each pipeline component type further described below.

Annotator

An Annotator runs a process which augments the input data it is given. That is, given a data instance object (Python dictionary), an annotator will add a new key value pair to the dictionary (e.g. tokenization output, part of speech tags, lemmatized version of the raw text, etc.).

An Annotator plugin is a subclass of the text_studio.Annotator abstract class, and implements the following class methods:

  • __init__: Configure the settings needed for the Annotator module to properly function.
    • Parameters:
      • keys: the list of keys (strings) in the data instance object dictionary correspond to the values that the Annotator needs in order to extract the data required for execution.
      • annotations: the list of keys (strings) that an Annotator should add to the data instance object dictionary, where the corresponding value(s) are computed by the Annotator when executed.
      • Additional Named Arguments: A plugin author may require any arbitrary named arguments that are necessary to configure the module's execution.
  • process_single: Annotate a single data instance with a new value.
    • Parameters:
      • doc: A dictionary representing a single data instance.
    • Return:
      • A dictionary that is the augmented version of the input object, now annotated with additional information.
  • process_batch: Annotate a collection of data instances with new values.
    • Parameters:
      • docs: An iterable containing dictionaries, which each represent a single data instance.
    • Return:
      • A collection of dictionaries, where each is an augmented version of an input object, now annotated with additional information.

Action

An Action consumes input data either individually or in bulk in order to produce an artifact about the input data, while not modifying or augmenting the input data instance(s). In this case, an artifact may be a visualization, a summary report, or any other insights that can be extracted from the provided data.

An Action plugin is a subclass of the text_studio.Action abstract class, and implements the following class methods:

  • __init__: Configure the settings needed for the Action module to properly function.
    • Parameters:
      • keys: the list of keys (strings) in the data instance object dictionary correspond to the values that the Action needs in order to extract the data required for execution.
      • Additional Named Arguments: A plugin author may require any arbitrary named arguments that are necessary to configure the module's execution.
  • process_single: Process a single data instance to produce insights.
    • Parameters:
      • doc: A dictionary representing a single data instance. information.
  • process_batch: Process a collection of data instances to produce insights.
    • Parameters:
      • docs: An iterable containing dictionaries, which each represent a single data instance.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

text-studio-0.0.1a0.tar.gz (8.7 kB view details)

Uploaded Source

Built Distribution

text_studio-0.0.1a0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file text-studio-0.0.1a0.tar.gz.

File metadata

  • Download URL: text-studio-0.0.1a0.tar.gz
  • Upload date:
  • Size: 8.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.0

File hashes

Hashes for text-studio-0.0.1a0.tar.gz
Algorithm Hash digest
SHA256 5f83f18c06ae179e1559da497758a7dc664ae355b938a55c9d9d3cea2ffff106
MD5 63a6d756cf448094c286ba2499bbe170
BLAKE2b-256 3793806d5182b1783fda2e4fad0078cdca29fa3f7fd2df9771d95f560d5a2e89

See more details on using hashes here.

File details

Details for the file text_studio-0.0.1a0-py3-none-any.whl.

File metadata

  • Download URL: text_studio-0.0.1a0-py3-none-any.whl
  • Upload date:
  • Size: 9.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.8.0

File hashes

Hashes for text_studio-0.0.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8e286404c41ebb7bfd5dbc43160a2a0e09d52672b75de390e7397071db2abe4
MD5 d37b409fbc03914a42a05848ca0614e3
BLAKE2b-256 8ac5cc7e58d30bce376279e121bfee84dcaa0beb7cec390ae9d8663a51a8abd5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page