Skip to main content

No project description provided

Project description

TimeCoder

timecoder.py - is a pipeline for division uploaded subtitles to blocks based on threshold of cosinus similarity

Inside the script there are 2 approaches:

  1. first summarization of subtitles followed by calculation of cosinus similarity
  2. first calculation of cosinus similarity followed by division by blocks and then summarization of each block

parse_subs.py - is a parser of YouTube subtitles converting them to pd.DataFrame sentence_similarity.py - script for calculation of cosinus similarity gpt_shortening.py - script for summarization

Different models for summarization and Sentence Similarity were compared. For similarity now we are using "IlyaGusev/mbart_ru_sum_gazeta". For Sentence Similarity the model called 'symanto/sn-xlm-roberta-base-snli-mnli-anli-xnli'.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blockdivision-0.1.0.tar.gz (3.6 kB view hashes)

Uploaded Source

Built Distribution

blockdivision-0.1.0-py3-none-any.whl (5.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page