Skip to main content

An Indonesian Headline Detection Python API.

Project description

headline_detector

Indonesian Headline Detection Python API

This is a Python library that provides APIs for detecting headlines in textual data, especially on social media platforms such as Twitter. The library utilizes a model that has been developed and trained on a dataset of Twitter posts containing both headline and non-headline texts, with the assistance of journalism professionals to ensure the data quality.

$ pip install headline-detector

Available scenario and the performance

Model Scenario 1 Scenario 2 Scenario 3 Scenario 4 Scenario 5 Scenario 6
Fasttext 0.8766 0.8714 0.8793 0.8714 0.8714 0.8661
CNN 0.9081 0.9081 0.8950 0.8898 0.8950 0.8898
IndoBERTweet 0.9895 0.9921 0.9738 0.9580 0.9843 0.9685

All meassured in accuracy

Model Throughput

Model Throughput (± Text/seconds)
IndoBERTweet ±1.3
CNN ±281.60
Fasttext ±2048.41

Tested on Intel i7-6700k and 32GB of RAM.

Usage

Output either 0 (non-headline) and 1 (headline)

from headline_detector import FasttextDetector, IndoBERTweetDetector, CNNDetector

detector = FasttextDetector.load_from_scenario(1)
data = detector.predict_text(
    [
        "nama kamu siapa?",
        "Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba  https://t.co/LD9X6VFaUR",
    ]
)
print(data)  # output: [0, 1]

detector = CNNDetector.load_from_scenario(3)
data = detector.predict_text(
    [
        "nama kamu siapa?",
        "Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba  https://t.co/LD9X6VFaUR",
    ]
)
print(data)  # output: [0, 1]

detector = IndoBERTweetDetector.load_from_scenario(5)
data = detector.predict_text(
    [
        "nama kamu siapa?",
        "Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba  https://t.co/LD9X6VFaUR",
    ]
)
print(data)  # output: [0, 1]

# 0 is non-headline
# 1 is headline

Paper

Coming soon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

headline_detector-1.0.3.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

headline_detector-1.0.3-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file headline_detector-1.0.3.tar.gz.

File metadata

  • Download URL: headline_detector-1.0.3.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for headline_detector-1.0.3.tar.gz
Algorithm Hash digest
SHA256 5c888280870ac88a4583d0f90cbeacfacf1779bb44090cc043146502e693f6cb
MD5 55f2404e1ca2aa66dba8c0ac4c1450fd
BLAKE2b-256 cfa9f4f47caa8e5792d11c05f4fd23f5d7e672560d733c5d49c173e6f1340dad

See more details on using hashes here.

File details

Details for the file headline_detector-1.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for headline_detector-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 59f112916a2ae725930d4258d689a1ef706ef663a6a50a1960a144441bcc2f5a
MD5 9c121f37d9f8a3186fb9c983d3125236
BLAKE2b-256 b961164a4ab756b7e79409e836b323a78c150aeffedc12399fb2e78dfa5f7fe8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page