Algorithms to track documents and build news stories from them. It implements the Miranda et al. (2018) algorithm, as well as other alternatives and baselines to track documents.
Project description
document_tracking
This project is originaly based on the Miranda et al.’s implementation of a news tracking algorithm published in the following paper (original code: )
Miranda, Sebastião, Artūrs Znotiņš, Shay B. Cohen, et Guntis Barzdins.
2018. “Multilingual Clustering of Streaming News”. In 2018
Conference on Empirical Methods in Natural Language Processing, 4535‑44.
Brussels, Belgium: Association for Computational Linguistics.
https://www.aclweb.org/anthology/D18-1483/.
This work is a reimplementation of the original work from these authors where the entire API was rewritten to be used by external projects. The idea of this reimplementation is to propose the news tracking algorithms with industrial standards, such as allowing automation and quality.
The package also includes alternative algorithms and baselines, currently only K-Means is provided.
Algorithm | Supervised | Main Class |
---|---|---|
Miranda et al. (2018) | Yes | document_tracking.miranda.StreamingAggregator |
K-Means | No | document_tracking.kmeans.KMeansAggregator |
Use the news_tracking
package in order to use this library if you’re not a developer.
Installation
pip install document_tracking
Licence
Some parts, provided by Miranda et al. in their original paper remain under the BSD-3 clause. The code that is reused from the original code is provided under a third_party
package and remains under its own license. Other part are released under the GPLv3 licence. To make it easier distinguish both licenses, headers were added on top of files. Both licenses are included in this repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file document_tracking-1.0.2.202209211710-py3-none-any.whl
.
File metadata
- Download URL: document_tracking-1.0.2.202209211710-py3-none-any.whl
- Upload date:
- Size: 37.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | db00beae4e084bc3bd0cdad30f92646c00a9fdefd5dc91876ca8a6f43a5ad611 |
|
MD5 | 93c001e58ff7a79c5c0d09ed76edd1a4 |
|
BLAKE2b-256 | 60452d8d666ca1877ac49d19384105c275c68c4076e073c853ad915618ef3e25 |