Skip to main content

Algorithms to track documents and build news stories from them. It implements the Miranda et al. (2018) algorithm, as well as other alternatives and baselines to track documents.

Project description

document_tracking

This project is originaly based on the Miranda et al.’s implementation of a news tracking algorithm published in the following paper (original code: )

Miranda, Sebastião, Artūrs Znotiņš, Shay B. Cohen, et Guntis Barzdins. 
2018. “Multilingual Clustering of Streaming News”. In 2018 
Conference on Empirical Methods in Natural Language Processing, 4535‑44.
Brussels, Belgium: Association for Computational Linguistics. 
https://www.aclweb.org/anthology/D18-1483/.

This work is a reimplementation of the original work from these authors where the entire API was rewritten to be used by external projects. The idea of this reimplementation is to propose the news tracking algorithms with industrial standards, such as allowing automation and quality.

The package also includes alternative algorithms and baselines, currently only K-Means is provided.

Algorithm Supervised Main Class
Miranda et al. (2018) Yes document_tracking.miranda.StreamingAggregator
K-Means No document_tracking.kmeans.KMeansAggregator

Use the news_tracking package in order to use this library if you’re not a developer.

Installation

pip install document_tracking

Licence

Some parts, provided by Miranda et al. in their original paper remain under the BSD-3 clause. The code that is reused from the original code is provided under a third_party package and remains under its own license. Other part are released under the GPLv3 licence. To make it easier distinguish both licenses, headers were added on top of files. Both licenses are included in this repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page