Skip to main content

news categories classifiers for news title

Project description

Fast-topi is a framework to develop and deploy models as an api service. The model or the service takes as an input a title and returns one of the following categories: Entertainment, Tech, Business, and Health. The framework is centered around a common configuration file config.yaml where all parameters for developing the model and deploying the REST api are stored.

The general use case for using fast-topi is deploying a service and then querying it via the command line.

./deploy.sh
python client.py --title Elon Musk named Time's Person of the year for 2020

Install

To install fast-topi using pip run

pip install fast-topi

REST API Deployment

To deploy a REST api locally run the following command.

./deploy.sh

Once the REST api is up and running, you can use the following path to query it

http://127.0.0.1:80/categories/?title=Elon usk named Time's Person of the year for 2021

This will retrun a json object containing {"category":"tech"} for this title.

Command line client

To get the news category for a given title you can use the command line client. A prerequisite is that the rest api is already deploied.

python client.py --title "Elon usk named Time's Person of the year for 2021"

Experiments

To train and test a new model you can run one of the following experiments. By default, a logistic regression model will be used using token n-grams. An alterantive is using a majority baseline by using --baseline.

Cross validation experiments

To run a n-folds cross validation experiments on part of the dataset run the following command. This will create a holdout set on which can be used to evaluate the model. The effectiveness of the classifier will be calculated for all available hyper parameter (c) in the config.yaml file. By default, the classifier will be evaluated on 5 splits and a holdout set will be created with 10 % of the whole dataset. To change these parameters you can edit the parameters split_counts and holdout_perc in the configuration file config.yaml.

python experiment.py --crossvalidate 

Testing on holdout set

Runs a one-split experiment classifier on the dataset by creating a holdout set which will be used to evaluate the model.

python experiment.py --test 

Training a model

To train a final model on the whole dataset and use it for the REST api use the command. This will store a new model under models/model.pkl. To change the default path of the model, edit the config.yaml file.

python experiment.py --train 

The experiment script allows to run experiments on a sample of the dataset using --sample. The size of the sample is stored on the config.yaml file.

Code testing

 python -m unittest tests/*.py

Configuration

The configuration for the REST api, model, experiments, and dataset are stored as yaml file under config.yaml.

Dependencies

To install the needed dependencies use the following command.

pip install --r requirements.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

fasttopi-0.30-py3-none-any.whl (2.6 MB view details)

Uploaded Python 3

File details

Details for the file fasttopi-0.30-py3-none-any.whl.

File metadata

  • Download URL: fasttopi-0.30-py3-none-any.whl
  • Upload date:
  • Size: 2.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for fasttopi-0.30-py3-none-any.whl
Algorithm Hash digest
SHA256 8628773eb886877a25c43b694f82e5522e40202b9f8a79c339a8d9103195f1ab
MD5 cc5f277b2700eb25700ecf1913f65da7
BLAKE2b-256 38be173cf37ecad292d67b92715fecb30155196c3a53d0260f9634a828ff3240

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page