Skip to main content

Cohesion measurement to evaluate partition

Project description

Topic Cohesion

The Topic-Detection field deals mainly with providing names to given divisions of documents and lacks a quality measurement that provides a rating for the division, that represent a human-subjective score.

Given a division topic_cohesion will calculate the human-subjective score, and the related topic name to each label in a division.

The POC to this attitude can be found in the colab-notebook, or in the "Topic Cohesion Project- Full report"

The useage example can be also found in the colab-notebook-usage-example

Installation

pip install topic-cohesion

Usage Example

The input to the topic cohesion process must be a csv, txt, tsv file with a tab ['\t'] seperator and must have 'label' and 'text' columns. The 'text' is a list of strings which represents all the corpus senteces while the 'label' is a list of integers that represents the corpus divison. In the next example, senteces 1-3 are belong to group 1 and senteces 4 and 5 belongs to group 2.

import pandas as pd
from cohesion import topic_cohesion

data = {'text':
            ["we like to play football",
             "I'm playing football better than neymar and cristano ronaldo",
             "I like Fifa more than I like football, My Fav team is #RealMadrid Hala Madrid",
             "Hamburger or Pizza? what would i choose? I will eat both of them, it so tasty!",
             "banana pancakes with syrup maple, thats my favorite meal"],
        'label':
            [1, 1, 1, 2, 2]}
df = pd.DataFrame(data)
score, topic_names  = topic_cohesion.run_df(df)
print("Cohesion Final score is: ", score)
print("Cohesion Topics are: ", topic_names)

Expected output

Cohesion Final score is: 0.99
Cohesion Topics are: ['like football play ronaldo playing', 'tasty pizza hamburger eat choose']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topic_cohesion-0.1.1.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

topic_cohesion-0.1.1-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file topic_cohesion-0.1.1.tar.gz.

File metadata

  • Download URL: topic_cohesion-0.1.1.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.8

File hashes

Hashes for topic_cohesion-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9d307274b5b4af82dcf91d65d99baa2e56725ae09f9a5d11aae8214e3a1ee457
MD5 a626b9850d35c927a87dd2de6236e14b
BLAKE2b-256 5ea830e1d9910c7a24b40b126a1560c65a178e85671086ef8675ac9c16290e3a

See more details on using hashes here.

File details

Details for the file topic_cohesion-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for topic_cohesion-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cb2e2b151af109636da31e6a7bf3f3298ebcab491b0bf285aa74e22df70b6eab
MD5 b60a9c4d3a70c5d3c9a7cb3d47aabfd8
BLAKE2b-256 1d77788f4adda97c64b13e756973546fddea30d3473369968572de93013d7d99

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page