Skip to main content

No project description provided

Project description

speech-analytics is a simple module for processing speech data collected as part of the Calpy project.

Documentation

class ConversationAnalysis

Parameters:

filename (str): The name of a calpy-style data file to analyse.

model_type (Optional[str]): The type of spacy model to use in the analysis. Default is 'en_core_web_sm'.

Methods:

add_analysis(analysis_type: str)
Adds the requested type of analysis to the data. Options are:

  • TOKENIZE: Tokenize the data in the utterances. The tokens created include raw text, part-of-speech tags, lemma, dependency information, and whether each word is a stop word.
  • UTTERANCE_LENGTH: Adds information about the number of words and number of tokens in an utterance.
  • TURNS: Combines utterances into turns (i.e. multiple consecutive utterances by the same speaker would be considered one turn).
  • PREPROCESS: Runs analysis with TOKENIZE, UTTERANCE_LENGTH, TURNS. Doing so will ensure all other methods work.
  • REMOVE_AUX_VERBS: Removes anything classified as an auxiliary verb (based on POS-tagging done in tokenization). If tokenization has not occurred before the removal of aux verbs, add_analysis will be called with the TOKENIZE parameter.
  • GRAMMAR_CORRECTION: Adds attempted corrections to grammar. Note that this analysis does not remove the original text (both the original text and) suggested corrections will be available. Utterances will have grammatical corrections suggested, but turns will only have suggested corrections if this is called after add_analysis with TURNS.

The names of each analysis type are constants provided in the module.

get_tokens()
Returns the raw token information. If no token information is available, this method will call add_analysis(TOKENIZE) in order to derive it.

get_utterances()
Returns the raw utterances. This information will not include utterance length unless add_analysis(UTTERANCE_LENGTH) is called first.

get_turns()
Returns the raw turns. If turns have not been processed, this method will call add_analysis(TURNS) first.

get_turn_info()
Returns the raw turn information. If no turn information is available, this method will call add_analysis(TURNS) in order to derive it.

get_grammar_corrections(by_turn=True)
Returns a list of tuples each containing original text and corrected text. By default, this method will return grammar corrections based on turns (calling add_analysis(GRAMMAR_CORRECTION) where necessary). If by_turn is set to False, grammar corrections for utterances will be returned instead.

get_pos_tags(by_turn=True)
Returns the pos tags for each turn (if by_turn is True, else each utterance). The return values is formatted as a list of lists, where each internal list consists of tuples of (token, pos_tag).

get_turn_length(turn, words=True)
Returns the number of words in a turn. If words is set to False, the method instead returns the number of tokens in the turn.

get_turn_duration(turn)
Returns the number of seconds in a turn.

get_utterance_length(utterance, words=True)
Returns the number of words in an utterance. If words is set to False, the method instead returns the number of tokens in the utterance.

get_utterance_duration(utterance)
Returns the number of seconds in an utterance.

get_pause_length(turn)
Returns the total number of seconds between utterances in a turn.

get_average_turn_length()
Returns the average turn length for each speaker, as a dictionary mapping speaker codes to average turn length.

get_speaker_turns(speaker)
Returns a list of all turns taken by the speaker.

get_speaker_utterances(speaker)
Returns a list of all utterances spoken by the speaker.

get_speaker_names()
Returns the names (ids) of all speakers in the conversation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech-analytics-0.1.8.tar.gz (8.6 kB view hashes)

Uploaded Source

Built Distribution

speech_analytics-0.1.8-py3-none-any.whl (7.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page