Skip to main content

A browser-based concordancer and language analysis application.

Project description

ConText

A browser-based concordancer and language analysis application.

This repository builds on work done in 2020 on a Python library, Jupyter Notebook and Dash application for the Mapping LAWS project. This work prototyped a browser-based alternative to desktop applications for corpus analysis. Ideas for this tool originated during my PhD thesis, which developed a browser-based analysis tool around a corpus of parliamentary discourse enabling rapid queries, new forms of analysis and browseable connections between different levels of analysis.

ConText builds on Conc, a Python library for corpus analysis.

Acknowledgements

Conc is developed by Dr Geoff Ford.

Work to create this Python library has been made possible by funding/support from:

  • “Mapping LAWS: Issue Mapping and Analyzing the Lethal Autonomous Weapons Debate” (Royal Society of New Zealand’s Marsden Fund Grant 19-UOC-068)
  • “Into the Deep: Analysing the Actors and Controversies Driving the Adoption of the World’s First Deep Sea Mining Governance” (Royal Society of New Zealand’s Marsden Fund Grant 22-UOC-059)
  • Sabbatical, University of Canterbury, Semester 1 2025.

Thanks to the Mapping LAWS project team for their support and feedback as first users of ConText.

Dr Ford is a researcher with Te Pokapū Aronui ā-Matihiko | UC Arts Digital Lab (ADL). Thanks to the ADL team and the ongoing support of the University of Canterbury’s Faculty of Arts who make work like this possible.

Design principles

Embed ConText

A key principle is to embed context from the texts, corpus and beyond into the application. This includes design choices to make the text, metadata and origin of the text visible and accessible. The text corpus can be navigated (and read) via a concordancer that sits alongside the text. To aide the researcher in interpretation, quantifications are directly linked to the texts they relate to.

Efficiency

The software prioritises speed through pre-processing via Conc. Intensive processing (tokenising, creating indexes, pre-computing useful counts) happens when the corpus is first built. This is done once and stored. This speeds up subsequent queries and statistical calculations. The frontend is minimal and lightweight and uses web technologies for interactivity. The interface opens up pathways for analysis by allowing navigation between levels of analysis and allowing researchers to quickly switch corpora and reference corpora to allow intuitive, comparitive exporation.

Installation

ConText launches a web interface. You will need Chromium (or Chrome) installed.

ConText is currently released as a pip-installable package. Other installation methods are coming soon.

To install via pip, setup a new Python 3.11+ environment and run the following command:

pip install contextapp

ConText/Conc requires installation of a Spacy model. For example, for English:

python -m spacy download en_core_web_sm

Notes: Conc installs the Polars library. If you are using an older pre-2013 machines, you will need to install Polars without optimisations for modern CPUs. Notes on this are available in the Conc installation documentation.

Using ConText

To use ConText currently you need to build your corpora using Conc from text files or CSV sources. You should have a corpus and reference corpus. Conc provides sample corpora to download and build.

To allow ConText to find them when it starts up store the created corpora in the same parent directory.

Run ConText like this ...

ConText --corpora /path/to/directory/with/processed/corpora/

To run the application in debug mode (relevant for development or diagnosing problems), use the following command:

ConText --corpora /path/to/directory/with/processed/corpora/ --mode development

A video tutorial on how to use ConText is coming soon.

Credit

  • Prototype styling is based on a Plotly Dash Stylesheet (MIT License)
  • Icons are via Ionicons

Coming soon

  • Pypi release
  • Video tutorial
  • run as an application on Windows/Linux/Mac
  • allow configuration of settings for all reports
  • updates of corpus/reference corpus will only refresh current page to allow comparing token-level results between corpora
  • json settings file for context to preserve state between loads
  • update html title on url changes
  • loading indicator via hx-indicator
  • record session in a json file per session
  • tooltips for buttons and other functionality
  • preferences (e.g. when expand reference corpus - remember that across session and store in json)
  • highlighty interface
  • make concordance plot lines clickable to text view
  • add ngram frequencies
  • links in collocation report --> conc: contextual restriction for concordances with +

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextapp-0.1.0.tar.gz (23.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contextapp-0.1.0-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file contextapp-0.1.0.tar.gz.

File metadata

  • Download URL: contextapp-0.1.0.tar.gz
  • Upload date:
  • Size: 23.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for contextapp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a988444b182e8d01534c1795a0ad3c3d223e3b8586edb96f8b1a9a1b1643fc4a
MD5 b7dcff5968dff9dd26b00b89e025107d
BLAKE2b-256 c35fdd5ec31cf78e532407730d18d8f80fda67091d29e11cea5c188632f326f0

See more details on using hashes here.

File details

Details for the file contextapp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: contextapp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 28.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for contextapp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea83f10d85ee90032cdbdb42c4f401ba40e698c3f4f2f8b004ab8c7668240f1f
MD5 3369044e39d4aff54fccfbd5b6b47c9d
BLAKE2b-256 e016dc4aae1bb2ed646b4a4b1ac89316a60e900f5c658e3b20ffb2e78877c5b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page