Skip to main content

Part 1 of the Fleksy NLP challenge

Project description

fleksychallenge

Part 1 of the Fleksy NLP challenge

GitHub release Lint status pre-commit licence

DescriptionInstallUsageContribute

Description

This is my implementation for the Fleksy NLP challenge (part 1).

The goal of this repository is to provide an interface to :

  • Retrieve and clean a Twitter dataset, for sentiment analysis
  • Train a sentiment analysis model using Scikit-learn or Spacy and following best practices for the metrics (for ranking the model against other SOTA models)

Install

Install the package with :

pip install fleksychallenge

For development, you can install it locally by first cloning the repository :

git clone https://github.com/astariul/fleksychallenge.git
cd fleksychallenge
pip install -e .

Usage

Prepare the dataset

To prepare the dataset, just run :

fleksychallenge prepare

It will download the dataset, preprocess it, and save the preprocessed data files locally.


By default, files are saved under the folder tweet_dataset, but you can change that behavior with the --dataset argument. For example:

fleksychallenge prepare --dataset ../my/folder

Train

Once the dataset is ready, you can start training the model with :

fleksychallenge train

It will train the model and save it under sentiment_model by default.


By default the model is trained on GPU. If you would like to train on CPU instead, you can specify the --cpu argument :

fleksychallenge train --cpu

You can change where to save the model by specifying the --model argument. For example :

fleksychallenge train --model my_model

If you preprocessed your dataset in a different folder, you must specify the location with the --dataset argument (similarly to the prepare command):

fleksychallenge train --dataset ../my/folder

A default configuration file is provided for training. You can also generate your own configuration file for training. To do this, head over to Spacy documentation and copy-paste the generated config in a file called base_config.cfg.

Then, run :

python -m spacy init fill-config ./base_config.cfg ./config.cfg

It will save the full config file at config.cfg.

Once your config file is generated, you can launch the training with :

fleksychallenge train --config config.cfg

Test

After training your model, you should test it ! You can do that with :

fleksychallenge test

It will load your trained model and compute several metrics (accuracy, precision, recall, F-1 score).

If you have to pick a single metric for comparing different models, you should pick Recall (as advised in the original paper of TweetEval)


As before, you can specify a different dataset to use for testing with the --dataset argument, or a different model to load with the --model argument.


Also, the test set of TweetEval is quite big (+12k samples), so by default the testing script will only evaluate the model on the first 100 samples. You can change this behavior by specifying the --full argument :

fleksychallenge test --full

Contribute

To contribute, install the package locally, create your own branch, add your code, and open a PR !

Pre-commit hooks

Pre-commit hooks are set to check the code added whenever you commit something.

If you never ran the hooks before, install it with :

pre-commit install

Then you can just try to commit your code. If you code does not meet the quality required by linters, it will not be committed. You can just fix your code and try to commit again !


You can manually run the pre-commit hooks with :

pre-commit run --all-files

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fleksychallenge-1.0.0.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

fleksychallenge-1.0.0-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file fleksychallenge-1.0.0.tar.gz.

File metadata

  • Download URL: fleksychallenge-1.0.0.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for fleksychallenge-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c18229051549e95c78c1302beef20ee7ffd57a14132d04ed12f040eccd0ae8d9
MD5 408d993fe2f9c4486a41f405b479beb1
BLAKE2b-256 d65adcddd4276085754b73c0d58913718c1939f84aacb2f3b062400da1ef4151

See more details on using hashes here.

File details

Details for the file fleksychallenge-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for fleksychallenge-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4c5cd100a3b21449f0fca64fb66f7078da449cdbdfe916c4da666e859addddb8
MD5 932c0b933ff01ddc3c265b7c27547ee1
BLAKE2b-256 d38c02a9787507682f40cfbfdd79f54e8541e340f43e37bfd4c62410f542c655

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page