Cohesion measurement to evaluate partition
Project description
Topic Cohesion
The Topic-Detection field deals mainly with providing names to given divisions of documents and lacks a quality measurement that provides a rating for the division, that represent a human-subjective score.
Given a division topic_cohesion will calculate the human-subjective score, and the related topic name to each label in a division.
The POC to this attitude can be found in the colab-notebook, or in the "Topic Cohesion Project- Full report"
The useage example can be also found in the colab-notebook-usage-example
Installation
pip install topic-cohesion
Usage Example
The input to the topic cohesion process must be a csv, txt, tsv file with a tab ['\t'] seperator and must have 'label' and 'text' columns. The 'text' is a list of strings which represents all the corpus senteces while the 'label' is a list of integers that represents the corpus divison. In the next example, senteces 1-3 are belong to group 1 and senteces 4 and 5 belongs to group 2.
import pandas as pd
from cohesion import topic_cohesion
data = {'text':
["we like to play football",
"I'm playing football better than neymar and cristano ronaldo",
"I like Fifa more than I like football, My Fav team is #RealMadrid Hala Madrid",
"Hamburger or Pizza? what would i choose? I will eat both of them, it so tasty!",
"banana pancakes with syrup maple, thats my favorite meal"],
'label':
[1, 1, 1, 2, 2]}
df = pd.DataFrame(data)
score, topic_names = topic_cohesion.run_df(df)
print("Cohesion Final score is: ", score)
print("Cohesion Topics are: ", topic_names)
Expected output
Cohesion Final score is: 0.99
Cohesion Topics are: ['like football play ronaldo playing', 'tasty pizza hamburger eat choose']
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file topic_cohesion-0.1.1.tar.gz
.
File metadata
- Download URL: topic_cohesion-0.1.1.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9d307274b5b4af82dcf91d65d99baa2e56725ae09f9a5d11aae8214e3a1ee457 |
|
MD5 | a626b9850d35c927a87dd2de6236e14b |
|
BLAKE2b-256 | 5ea830e1d9910c7a24b40b126a1560c65a178e85671086ef8675ac9c16290e3a |
File details
Details for the file topic_cohesion-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: topic_cohesion-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb2e2b151af109636da31e6a7bf3f3298ebcab491b0bf285aa74e22df70b6eab |
|
MD5 | b60a9c4d3a70c5d3c9a7cb3d47aabfd8 |
|
BLAKE2b-256 | 1d77788f4adda97c64b13e756973546fddea30d3473369968572de93013d7d99 |