Skip to main content

Algorithms to track documents and build news stories from them. It implements the Miranda et al. (2018) algorithm, as well as other alternatives and baselines to track documents.

Project description

document_tracking

This project is originaly based on the Miranda et al.’s implementation of a news tracking algorithm published in the following paper (original code: )

Miranda, Sebastião, Artūrs Znotiņš, Shay B. Cohen, et Guntis Barzdins. 
2018. “Multilingual Clustering of Streaming News”. In 2018 
Conference on Empirical Methods in Natural Language Processing, 4535‑44.
Brussels, Belgium: Association for Computational Linguistics. 
https://www.aclweb.org/anthology/D18-1483/.

This work is a reimplementation of the original work from these authors where the entire API was rewritten to be used by external projects. The idea of this reimplementation is to propose the news tracking algorithms with industrial standards, such as allowing automation and quality.

The package also includes alternative algorithms and baselines, currently only K-Means is provided.

Algorithm Supervised Main Class
Miranda et al. (2018) Yes document_tracking.miranda.StreamingAggregator
K-Means No document_tracking.kmeans.KMeansAggregator

Use the news_tracking package in order to use this library if you’re not a developer.

Installation

pip install document_tracking

Licence

Some parts, provided by Miranda et al. in their original paper remain under the BSD-3 clause. The code that is reused from the original code is provided under a third_party package and remains under its own license. Other part are released under the GPLv3 licence. To make it easier distinguish both licenses, headers were added on top of files. Both licenses are included in this repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

File details

Details for the file document_tracking-1.0.2.202208310824-py3-none-any.whl.

File metadata

File hashes

Hashes for document_tracking-1.0.2.202208310824-py3-none-any.whl
Algorithm Hash digest
SHA256 925237de2fd2d6df9c9ebd97470e78dcce10bc22f9d4f3e9b0bb091894a19eac
MD5 9ba8d97a88614595aaf81d5d672fc8fa
BLAKE2b-256 7a5266f4ade3b4f9422c7cb3fe2de8ca0fcbcce1a4339fa9cef6cae28c44ea5f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page