Skip to main content

Trending news library

Project description

Royston

An open source, real time trend detection framework written in Python. This project uses machine learning to detect trends in text over time.

Trends are identified by detecting phrases that start occurring much more frequently than those that don't typically occur. Various natural language processing and data science techniques are used to ensure similar words are modelled together (i.e. "cycle", "cycling" and "cyclist" all reduce down to a common word form, such as "cycle").

Documents can be grouped by a subject, so it is possible to detect "localised" trends. Similar phrases tend to relate to a particular trend, so hierachical clustering is used to make sure documents related to the same trend are grouped, rather than creating two "trends" about the same thing. For example, "doping scandal" and "Tour de France" are likely to be about the same thing...allegedly.

Based on ramekin, but going to take it further to do real time detection and maintaining models rather than creating them each time.

Running tests

Install coverage with the following command:

pip3 install coverage

Run tests:

coverage run -m unittest royston.tests.royston_test -v
coverage report -m  royston/royston.py

Contribute?

This is still in the early stages of being ported over from JavaScript, and any help would be appreciated. The issues contain a lot of features that are needed. Please get in touch via LinkedIn and I can talk you thought anything.

Main concerns are:

  • 100% test coverage.
  • Retain the document format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

royston-0.0.1.tar.gz (10.2 kB view hashes)

Uploaded Source

Built Distribution

royston-0.0.1-py3-none-any.whl (12.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page