No project description provided
Project description
speech-analytics
speech-analytics is a simple module for processing speech data collected as part of the Calpy project.
Documentation
class ConversationAnalysis
Parameters:
`filename (str)`: The name of a calpy-style data file to analyse
`model_type (Optional[str])`: The type of spacy model to use in the analysis.
Default is 'en_core_web_sm'.
Methods:
add_analysis(analysis_type: str)
Adds the requested type of analysis to the data. Options are:
- TOKENIZE: Tokenize the data in the utterances. The tokens created include raw text, part-of-speech tags, lemma, dependency information, and whether each word is a stop word.
- UTTERANCE_LENGTH: Adds information about the number of words and number of tokens in an utterance.
- TURNS: Combines utterances into turns (i.e. multiple consecutive utterances by the same speaker would be considered one turn).
- PREPROCESS: Runs analysis with TOKENIZE, UTTERANCE_LENGTH, TURNS. Doing so will ensure all other methods work.
- REMOVE_AUX_VERBS: Removes anything classified as an auxiliary verb (based on
POS-tagging done in tokenization). Note: you will need to run
add_analysis
with the TOKENIZE option before running this. - GRAMMAR_CORRECTIONS: Adds attempted corrections to grammar. Note that this analysis does not remove the original text (both the original text and) suggested corrections will be available. Utterances will have grammatical corrections suggested, but turns will only have suggested corrections if this is called after add_analysis with TURNS.
The names of each analysis type are constants provided in the module.
get_tokens()
Returns the raw token information. If no token information is available, this
method will call add_analysis(TOKENIZE)
in order to derive it.
get_utterance_info()
Returns the raw utterance information. This information will not include
utterance length unless add_analysis(UTTERANCE_LENGTH)
is called first.
get_turn_info()
Returns the raw turn information. If no turn information is available, this
method will call add_analysis(TURNS)
in order to derive it.
get_grammar_corrections(by_turn=True)
Returns a list of tuples each containing original text and corrected text.
By default, this method will return grammar corrections based on turns
(calling add_analysis(TURNS)
where necessary). If by_turn
is set to False,
grammar corrections for utterances will be returned instead.
get_pos_tags(by_turn=True)
Returns the pos tags for each turn (if by_turn is True, else each utterance).
The return values is formatted as a list of lists, where each internal list
consists of tuples of (token, pos_tag).
get_turn_length(turn, words=True)
Returns the number of words in a turn. If words is set to False, the method instead
returns the number of tokens in the turn.
get_turn_duration(turn)
Returns the number of seconds in a turn
get_utterance_length(utterance, words=True)
Returns the number of words in an utterance. If words is set to False, the method instead
returns the number of tokens in the utterance.
get_utterance_duration(utterance)
Returns the number of seconds in a turn
get_pause_length(turn)
Returns the number of seconds between utterances in a turn
get_average_turn_length()
Returns the average turn length for each speaker, as a dictionary mapping
speaker codes to average turn length.
get_average_utterance_length()
Returns the average utterance length for each speaker, as a dictionary mapping
speaker codes to average turn length.
get_speaker_turns(speaker)
Returns a list of all turns taken by the speaker.
get_speaker_utterances(speaker)
Returns a list of all utterances spoken by the speaker.
get_speaker_names()
Returns the names (ids) of all speakers in the conversation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for speech_analytics-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dea144e0af632026e49768f986030d3282a7f841a4d0024a4daf39a6f00729e3 |
|
MD5 | c5855d3a2298a48ca0982dce3ca83033 |
|
BLAKE2b-256 | e4c5f0fcb2be3f5e0757be5dff9a39eac3243d66fbbb3dadf408b9dd22223d63 |