Skip to main content

A toolkit that generates a variety of features for team conversation data.

Project description

Testing Features GitHub release License

The Team Communication Toolkit

The Team Communication Toolkit is a research project and Python package that aims to make it easier for social scientists to explore text-based conversational data.

View - Home Page

View - Documentation

Getting Started

If you are new to this repository, welcome! Please follow the steps below to get started.

Step 1: Clone the Repo

First, clone this repository into your local development environment:

git clone https://github.com/Watts-Lab/team_comm_tools.git

Step 2: Download Dependencies

Python Version: We require >= python3.10 when running this repository.

We strongly recommend using a virtual environment to install the dependencies required for the project.

Running the following script will install all required packages and dependencies:

./setup.sh

Step 3: Run the Featurizer

At this point, you should be ready to run the featurizer! Navigate to the examples folder, and use the following command:

python3 featurize.py

This calls the featurizer.py file, which declares a FeatureBuilder object for different dataset of interest, and featurizes them using our framework. The featurize.py file provides an end-to-end worked example of how you can declare a FeatureBuilder and call it on data; equally, you can replace this file with any file / notebook of your choosing, as long as you import the FeatureBuilder module.

Contributing Code and Automated Unit Testing

If you would like to contribute to the repository, we have implemented a Pull Request Template with a basic checklist that you should consider when adding code (e.g., improving documentation or developing a new feature).

We have also implemented automated unit testing of all code (which runs upon every push to GitHub), allowing us to ensure that new features function as expected and do not break any previous features. The points below highlight key steps to using our automated test suite.

  1. Draft test inputs (conversation_num, speaker, message) and expected outputs for your feature.
  • For example, "This is a test message." should return 5 for num_words at the chat level (note that conversation_num and speaker have no effect on the ultimate result, so they can be chosen arbitrarily).
  • Testing a conversation level feature, say discursive_diversity, requires a series of chats rather than just one chat. For example, "This is a test message." (speaker 1), "This is a test message." (speaker 1), "This is a test message." (speaker 2), "This is a test message." (speaker 2), within the same conversation, should return 0. Note that the conversation_num for each new test should be distinct from all previous conversation_num, even if the feature being tested is different.
  1. Once you have test inputs, add each CHAT (and its associated conversation_num and speaker) as a separate row in either test_chat_level.csv or test_conv_level.csv, within ./tests/data/cleaned_data. The format of the CSV is as follows: id, conversation_num, speaker_nickname, message, expected_column, expected_value, where expected_column is the feature name (i.e. num_words).

  2. Push all your changes to GitHub, including feature development and test dataset additions. Go under the "Actions" tab in the toolbar. Notice there's a new job running called "Testing-Features". A green checkmark at the conclusion of this job indicates all new tests have passed. A red cross means some test has failed. Navigate to the uploaded "Artifact" (near the bottom of the status page) for list of failed tests and their associated inputs/outputs.

  3. Debug and iterate!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

team_comm_tools-0.1.0.tar.gz (285.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

team_comm_tools-0.1.0-py3-none-any.whl (316.0 kB view details)

Uploaded Python 3

File details

Details for the file team_comm_tools-0.1.0.tar.gz.

File metadata

  • Download URL: team_comm_tools-0.1.0.tar.gz
  • Upload date:
  • Size: 285.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.0

File hashes

Hashes for team_comm_tools-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1b212b30279b0127ed7c61758588a1d1fb6c127dac83c193734a405667997cc0
MD5 74138aebade008b113fa1ba90f936e04
BLAKE2b-256 839ecf6a5ff8f2ac9496611e1351306e6daf6b68aca598673837406110c504e8

See more details on using hashes here.

File details

Details for the file team_comm_tools-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for team_comm_tools-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0aa534769923861ae76288e7b759ccbd66f41f181d2fdc96c7d5c356ac2fa41e
MD5 27aee1b32de7241e9e3f9aff756ef6a1
BLAKE2b-256 aa530f3f492068b89301a3ff7b70384e08b9ebb94b1f999e75b5e5e8c6e2b1f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page