Skip to main content

news categories classifiers for news title

Project description

Fast-topi is a framework to develop and deploy models as an api service. The model or the service takes as an input a title and returns one of the following categories: Entertainment, Tech, Business, and Health. The framework is centered around a common configuration file config.yaml where all parameters for developing the model and deploying the REST api are stored.

The general use case for using fast-topi is deploying a service and then querying it via the command line.

./deploy.sh
python client.py --title Elon Musk named Time's Person of the year for 2020

Install

To install fast-topi using pip run

pip install fast-topi

REST API Deployment

To deploy a REST api locally run the following command.

./deploy.sh

Once the REST api is up and running, you can use the following path to query it

http://127.0.0.1:80/categories/?title=Elon usk named Time's Person of the year for 2021

This will retrun a json object containing {"category":"tech"} for this title.

Command line client

To get the news category for a given title you can use the command line client. A prerequisite is that the rest api is already deploied.

python client.py --title "Elon usk named Time's Person of the year for 2021"

Experiments

To train and test a new model you can run one of the following experiments. By default, a logistic regression model will be used using token n-grams. An alterantive is using a majority baseline by using --baseline.

Cross validation experiments

To run a n-folds cross validation experiments on part of the dataset run the following command. This will create a holdout set on which can be used to evaluate the model. The effectiveness of the classifier will be calculated for all available hyper parameter (c) in the config.yaml file. By default, the classifier will be evaluated on 5 splits and a holdout set will be created with 10 % of the whole dataset. To change these parameters you can edit the parameters split_counts and holdout_perc in the configuration file config.yaml.

python experiment.py --crossvalidate 

Testing on holdout set

Runs a one-split experiment classifier on the dataset by creating a holdout set which will be used to evaluate the model.

python experiment.py --test 

Training a model

To train a final model on the whole dataset and use it for the REST api use the command. This will store a new model under models/model.pkl. To change the default path of the model, edit the config.yaml file.

python experiment.py --train 

The experiment script allows to run experiments on a sample of the dataset using --sample. The size of the sample is stored on the config.yaml file.

Code testing

 python -m unittest tests/*.py

Configuration

The configuration for the REST api, model, experiments, and dataset are stored as yaml file under config.yaml.

Dependencies

To install the needed dependencies use the following command.

pip install --r requirements.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

fasttopi-0.22-py3-none-any.whl (383.1 kB view details)

Uploaded Python 3

File details

Details for the file fasttopi-0.22-py3-none-any.whl.

File metadata

  • Download URL: fasttopi-0.22-py3-none-any.whl
  • Upload date:
  • Size: 383.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.12

File hashes

Hashes for fasttopi-0.22-py3-none-any.whl
Algorithm Hash digest
SHA256 85d3c44ee57b9c6438bcab618dd0bf263b206716ace88ace68a1511025b96be0
MD5 e23c94c4949553a64c230faebd0d8332
BLAKE2b-256 b52075be91b9247099381d72ef1160ee1453d46e706ed010193b379345f02008

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page