Skip to main content

news categories classifiers for news title

Project description

##Overview Fast-topi is a framework to develop and deploy models as an api service. The model or the service takes as an input a title and returns one of the following categories: Entertainment, Tech, Business, and Health. The framework is centered around a common configuration file config.yaml where all parameters for developing the model and deploying the REST api are stored.

The general use case for using fast-topi is deploying a service and then querying it via the command line.

./deploy.sh
python client.py --title Elon Musk named Time's Person of the year for 2020

Install

To install fast-topi using pip run

pip install fast-topi

REST API Deployment

To deploy a REST api locally run the following command.

./deploy.sh

Once the REST api is up and running, you can use the following path to query it

http://127.0.0.1:8000/categories/?title="Elon usk named Time's Person of the year for 2021"

This will retrun a json object containing {"category":"tech"} for this title.

Command line client

To get the news category for a given title you can use the command line client. A prerequisite is that the rest api is already deploied.

python client --title "Elon usk named Time's Person of the year for 2021"

Experiments

To train and test a new model you can run one of the following experiments. By default, a logistic regression model will be used using token n-grams. An alterantive is using a majority baseline by using --baseline.

Cross validation experiments

To run a n-folds cross validation experiments on part of the dataset run the following command. This will create a holdout set on which can be used to evaluate the model. The effectiveness of the classifier will be calculated for all available hyper parameter (c) in the config.yaml file. By default, the classifier will be evaluated on 5 splits and a holdout set will be created with 10 % of the whole dataset. To change these parameters you can edit the parameters split_counts and holdout_perc in the configuration file config.yaml.

python experiment.py --crossvalidate 

Testing on holdout set

Runs a one-split experiment classifier on the dataset by creating a holdout set which will be used to evaluate the model.

python experiment.py --test 

Training a model

To train a final model on the whole dataset and use it for the REST api use the command. This will store a new model under models/model.pkl. To change the default path of the model, edit the config.yaml file.

python experiment.py --train 

The experiment script allows to run experiments on a sample of the dataset using --sample. The size of the sample is stored on the config.yaml file.

Code testing

 python -m unittest tests/*.py

Configuration

The configuration for the REST api, model, experiments, and dataset are stored as yaml file under config.yaml.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

Fasttopi-0.8-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file Fasttopi-0.8-py3-none-any.whl.

File metadata

  • Download URL: Fasttopi-0.8-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.12

File hashes

Hashes for Fasttopi-0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 a752948d8775f6a314f42b4aa7e2d2c1158d5745617c3bf2fc41d6c12040d04a
MD5 947773f3c7dda355bc0ed2b8749a8bc0
BLAKE2b-256 81a624bac50efe4efd54a8111a06289d3265b0d5b5104875819f1c109d0702ba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page