Skip to main content

sociaML - the Swiss Army knife for audiovisual and textual video feature extraction.

Project description

pdm-managed GitHub License PyPI - Version

sociaML - the Swiss Army knife for audiovisual and textual video feature extraction.

sociaML is a Python package designed for the automatic analysis of videos. It facilitates the extraction of audiovisual and textual features from videos, offering a comprehensive toolkit for researchers and developers working with multimedia data. With sociaML you can extract features relevant downstream research (eg social sciences) with little knowledge machine learning or even Python.

Features

  • Transcription and Diarization: Utilizes WhisperX for transcription and diarization.
  • Anonymization: Incorporates Presidio for automatic anonymization of data.
  • Audio Features Extraction: Extracts various audio features like emotions, MFCCs, speaking times, and silent times.
  • Visual Features Extraction: Analyzes facial emotions, Facial Action Coding System, and facial posture.
  • Textual Features Extraction: Provides analysis on Ekman emotions, sentiment, word/token counts, sentence embeddings, and more.

Attention: Only tested on Linux and MacOS

Usage

Installation

Make sure you have https://www.ffmpeg.org/ installed on your system. It is a prerequisite for Whisper.

As the API is not stable yet, please install it directly from git

pip install git+https://github.com/davidrzs/sociaML

General Architecture

SociaMLs pipeline can best be summarized by the following graphic:

pipeline

Huggingface API Key

To run the pipeline you might need a Huggingface API Key which you get here.

You can make the huggingface token available as follows:

import os
os.environ["HUGGINGFACE_TOKEN"] = access_token

Preprocessing

sociaML offers a preprocessing pipeline that converts videos into an intermediate JSON representation for efficient analysis. This step involves transcription, diarization, and anonymization.

from sociaML.preprocessing import TranscriberAndDiarizer, Anonymizer, AudioExtractor

# Initialize components
transcriber = TranscriberAndDiarizer(pyannote_api_key=os.getenv('HUGGINGFACE_TOKEN'))
anonymizer = Anonymizer()
audio_extractor = AudioExtractor()

# Process video
audio_extractor.process(video_path, audio_path=audio_path)
transcript = transcriber.process(video_path)
transcript = anonymizer.process(transcript)

Analysis

sociaML provides a flexible analysis framework, allowing for the extraction of various features at different levels: Contribution, Participant, and Global.

from sociaML.analysis import Analysis, GlobalAudioEmotionAnalyzer, ParticipantAudioEmotionAnalyzer, ParticipantSentimentAnalyzer, GlobalEkmanEmotionAnalyzer, GlobalNLTKTokenCountAnalyzer, ContributionAudioEmotionAnalyzer

# Initialize Analysis with desired Analyzers
analyzer = Analysis(
    GlobalAudioEmotionAnalyzer(),
    ParticipantAudioEmotionAnalyzer(),
    ParticipantSentimentAnalyzer(),
    GlobalEkmanEmotionAnalyzer(), 
    GlobalNLTKTokenCountAnalyzer(), 
    ContributionAudioEmotionAnalyzer()
)

# Run analyses
global_feat, participant_feat, contribution_feat = analyzer.analyze(data_json, audio, sr, video)

Explanation of Concepts

When analyzing multimedia content with sociaML, understanding the context of the interaction is as crucial as the content itself. To provide a nuanced analysis, sociaML collects features at three different levels: Global, Participant, and Contribution. Below, we explain these concepts with an example from Shakespeare's Hamlet.

See the figure below for an illustration of these concepts:

Global Features

Global features are derived by aggregating data across the entire video, without distinguishing between different participants. This level provides an overall summary of the video's characteristics, such as the general sentiment or mood throughout the play. For example, in a performance of Hamlet, global features would analyze the cumulative emotional tone of the entire play, providing insights into the overarching emotional landscape.

Participant Features

At the participant level, sociaML examines the data attributed to individual characters or speakers within the video. By focusing on each participant's contributions as a whole, we can compare and contrast different characters. For instance, in Hamlet, we could evaluate whether Hamlet exhibits a generally more negative sentiment compared to other characters like Horatio or Marcellus, or we might analyze the range of emotions that each character displays throughout the play.

Contribution Features

The most granular level of analysis, contribution features, focuses on individual blocks of speech or action by a single participant. Each time a character speaks or performs an action uninterrupted, it's considered a single contribution. In our Hamlet example, this means analyzing specific speeches or soliloquies to determine the sentiment, emotions, and other features of that particular moment. For instance, we can analyze the emotional intensity of Hamlet's famous "To be, or not to be" soliloquy independently of the rest of the play.

illustration of concepts

Collaborating and Getting Involved

If you have feature requests or want to co-develop this package please do not hesitate to reach out!

Collaborators

Developer

Previous Developers:

Technical guidance by

Sources

This project stands on the shoulders of giants and merely provides a convenient wrapper for them. If you use sociaML in your research please cite the original models below:

Jochen Hartmann, "Emotion English DistilRoBERTa-base". https://huggingface.co/j-hartmann/emotion-english-distilroberta-base/, 2022.

Cheong, J. H., et al. "Py-Feat: Python facial expression analysis toolbox. arXiv [cs. CV]." 2021,

Plaquet, Alexis, and Hervé Bredin. "Powerset multi-class cross entropy loss for neural speaker diarization." arXiv preprint arXiv:2310.13025 (2023).

Bredin, H. (2023) pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe. Proc. INTERSPEECH 2023, 1983-1987, doi: 10.21437/Interspeech.2023-105

Reimers, N. "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." arXiv preprint arXiv:1908.10084 (2019).

Daniel Loureiro, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, and Jose Camacho-collados. 2022. TimeLMs: Diachronic Language Models from Twitter. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 251–260, Dublin, Ireland. Association for Computational Linguistics.

Daniel Loureiro, Francesco Barbieri, Leonardo Neves, Luis Espinosa Anke, and Jose Camacho-collados. 2022. TimeLMs: Diachronic Language Models from Twitter. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 251–260, Dublin, Ireland. Association for Computational Linguistics.

License

Code is licensed under the permissive MIT license. Certain modules we depend on have different licenses though!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sociaml-0.1.4.tar.gz (17.8 MB view details)

Uploaded Source

Built Distribution

sociaml-0.1.4-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file sociaml-0.1.4.tar.gz.

File metadata

  • Download URL: sociaml-0.1.4.tar.gz
  • Upload date:
  • Size: 17.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.29

File hashes

Hashes for sociaml-0.1.4.tar.gz
Algorithm Hash digest
SHA256 3d65ee624ce5ecea9dc76a6aab97c8e5378ec9bdd3d3e14fe442632a59555f0e
MD5 f2990bcd2748c8c2048e5179d6ff4060
BLAKE2b-256 9c3dfc98589a23304b65f258e9fbd1ce5e4584062e9f14132d934adb8d9253b9

See more details on using hashes here.

File details

Details for the file sociaml-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: sociaml-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.29

File hashes

Hashes for sociaml-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3aff74d5fc6be5fe3ee09e457b36f99cdae7eb11fa8734210a17c3c73e91730a
MD5 5348c2cda4b7f247da7cb54f39090467
BLAKE2b-256 bb3a310751cacb810990defc2f5bec67ba79c3c25c86f075ed5f14be5e9ca947

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page