An Indonesian Headline Detection Python API.
Project description
headline_detector
Indonesian Headline Detection Python API
This is a Python library that provides APIs for detecting headlines in textual data, especially on social media platforms such as Twitter. The library utilizes a model that has been developed and trained on a dataset of Twitter posts containing both headline and non-headline texts, with the assistance of journalism professionals to ensure the data quality.
$ pip install headline-detector
Available scenario and the performance
Model | Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 | Scenario 5 | Scenario 6 |
---|---|---|---|---|---|---|
Fasttext | 0.8766 | 0.8714 | 0.8793 | 0.8714 | 0.8714 | 0.8661 |
CNN | 0.9081 | 0.9081 | 0.8950 | 0.8898 | 0.8950 | 0.8898 |
IndoBERTweet | 0.9895 | 0.9921 | 0.9738 | 0.9580 | 0.9843 | 0.9685 |
All meassured in accuracy
Model Throughput
Model | Throughput (± Text/seconds) |
---|---|
IndoBERTweet | ±1.3 |
CNN | ±281.60 |
Fasttext | ±2048.41 |
Tested on Intel i7-6700k and 32GB of RAM.
Usage
Output either 0 (non-headline) and 1 (headline)
from headline_detector import FasttextDetector, IndoBERTweetDetector, CNNDetector
detector = FasttextDetector.load_from_scenario(1)
data = detector.predict_text(
[
"nama kamu siapa?",
"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
]
)
print(data) # output: [0, 1]
detector = CNNDetector.load_from_scenario(3)
data = detector.predict_text(
[
"nama kamu siapa?",
"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
]
)
print(data) # output: [0, 1]
detector = IndoBERTweetDetector.load_from_scenario(5)
data = detector.predict_text(
[
"nama kamu siapa?",
"Kapolda Jatim Teddy Minahasa Dikabarkan Ditangkap Terkait Narkoba https://t.co/LD9X6VFaUR",
]
)
print(data) # output: [0, 1]
# 0 is non-headline
# 1 is headline
Paper
Coming soon.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for headline_detector-1.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7578add47065fa0ed07a584dc66f99c3d186d06a9fe36e8c8ec78b495813fa1 |
|
MD5 | 49b1cd0ec9b0c342ff25801ed43915ee |
|
BLAKE2b-256 | b41220298570f1023ccbfdf5972b5cac6758263c3d899110521dec6b8a3d4022 |