Skip to main content

Algorithms to track documents and build news stories from them. It implements the Miranda et al. (2018) algorithm, as well as other alternatives and baselines to track documents.

Project description

document_tracking

This project is originaly based on the Miranda et al.’s implementation of a news tracking algorithm published in the following paper (original code: )

Miranda, Sebastião, Artūrs Znotiņš, Shay B. Cohen, et Guntis Barzdins. 
2018. “Multilingual Clustering of Streaming News”. In 2018 
Conference on Empirical Methods in Natural Language Processing, 4535‑44.
Brussels, Belgium: Association for Computational Linguistics. 
https://www.aclweb.org/anthology/D18-1483/.

This work is a reimplementation of the original work from these authors where the entire API was rewritten to be used by external projects. The idea of this reimplementation is to propose the news tracking algorithms with industrial standards, such as allowing automation and quality.

The package also includes alternative algorithms and baselines, currently only K-Means is provided.

Algorithm Supervised Main Class
Miranda et al. (2018) Yes document_tracking.miranda.StreamingAggregator
K-Means No document_tracking.kmeans.KMeansAggregator

Use the news_tracking package in order to use this library if you’re not a developer.

Installation

pip install document_tracking

Licence

Some parts, provided by Miranda et al. in their original paper remain under the BSD-3 clause. The code that is reused from the original code is provided under a third_party package and remains under its own license. Other part are released under the GPLv3 licence. To make it easier distinguish both licenses, headers were added on top of files. Both licenses are included in this repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

File details

Details for the file document_tracking-1.0.2.202209211710-py3-none-any.whl.

File metadata

File hashes

Hashes for document_tracking-1.0.2.202209211710-py3-none-any.whl
Algorithm Hash digest
SHA256 db00beae4e084bc3bd0cdad30f92646c00a9fdefd5dc91876ca8a6f43a5ad611
MD5 93c001e58ff7a79c5c0d09ed76edd1a4
BLAKE2b-256 60452d8d666ca1877ac49d19384105c275c68c4076e073c853ad915618ef3e25

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page