No project description provided
Project description
TimeCoder
timecoder.py - is a pipeline for division uploaded subtitles to blocks based on threshold of cosinus similarity
Inside the script there are 2 approaches:
- first summarization of subtitles followed by calculation of cosinus similarity
- first calculation of cosinus similarity followed by division by blocks and then summarization of each block
parse_subs.py - is a parser of YouTube subtitles converting them to pd.DataFrame sentence_similarity.py - script for calculation of cosinus similarity gpt_shortening.py - script for summarization
Different models for summarization and Sentence Similarity were compared. For similarity now we are using "IlyaGusev/mbart_ru_sum_gazeta". For Sentence Similarity the model called 'symanto/sn-xlm-roberta-base-snli-mnli-anli-xnli'.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
blockdivision-0.1.0.tar.gz
(3.6 kB
view hashes)
Built Distribution
Close
Hashes for blockdivision-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e6c468b1bc394659467226c35a09cb75f0c839974d96461086dc563c3ccb31f |
|
MD5 | a82aa72f545e3fc0e5d359afe98e5370 |
|
BLAKE2b-256 | 7bc139b1efa46c70539f3e38144bca580096fabca96ea26b4d41934e88d74465 |