Story Clustering Bot for Taranis-NG
Project description
Story Clustering
This code takes newsitems in the format as provided by Taranis-NG and clusters them into Stories.
Description and Use
The approach supports the following functionalities:
- Automatically detect Events.
- News items are clustered based on the detected Events.
- Documents belonging to related Events are then clustered into Stories.
Initial clustering
The method initial_clustering
in clustering.py
takes as input a dictionary of news_items_aggregate
(see tests/testdapa.py
for the actual input format) and outputs a dictionary containing two keys:
("event_clusters" : list of list of documents ids) and
("story_clusters" : list of list of documents ids)
Incremental clustering
The incremental clustering method takes as input a dictionary of news_items_aggregate
, containing new news items to be clustered, and clustered_news_items_aggregate
, containing already clustered items, and tries to cluster the new documents to the existing clusters or create new ones. See tests/testdata.py
for the actual input formats. This method also
outputs a dictionary containing two keys:
("event_clusters" : list of list of documents ids) and
("story_clusters" : list of list of documents ids)
Installation
The requirements.txt
file should list all Python libraries that the story-clustering
depends on, and they will be installed using:
pip install .
Development
pip install .[dev]
Use
See notebook\test_story_clustering.ipynb
for examples on how to use the clustering methods.
License
EUROPEAN UNION PUBLIC LICENCE v. 1.2
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for taranis_story_clustering-0.4.6.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | fec539d8d3363a27deb66e69c4682c62f161b6e9d4c83cb8d24b4e8d3c6955b8 |
|
MD5 | 83f3e5282234051477984c9b2c857aa5 |
|
BLAKE2b-256 | 5d594277f94b4036674d0c60d78d00c4abedf1b6d5fa8ecbd7c20d4853762140 |
Hashes for taranis_story_clustering-0.4.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f75b86102bb25d7c4e69c6a45985decf63a8c580bad5dd4b5a6455fd6220a9d5 |
|
MD5 | 0ec506a17e061f9fcb46ccf1fce9e8fd |
|
BLAKE2b-256 | 6ed1ebb0313872bf31e236d510e3ea9525be48a1f7df52885dfd2bf1a4bc4c3d |