Part 1 of the Fleksy NLP challenge
Project description
fleksychallenge
Part 1 of the Fleksy NLP challenge
Description • Install • Usage • Contribute
Description
This is my implementation for the Fleksy NLP challenge (part 1).
The goal of this repository is to provide an interface to :
- Retrieve and clean a Twitter dataset, for sentiment analysis
- Train a sentiment analysis model using
Scikit-learn
orSpacy
and following best practices for the metrics (for ranking the model against other SOTA models)
Install
Install the package with :
pip install fleksychallenge
For development, you can install it locally by first cloning the repository :
git clone https://github.com/astariul/fleksychallenge.git
cd fleksychallenge
pip install -e .
Usage
Prepare the dataset
To prepare the dataset, just run :
fleksychallenge prepare
It will download the dataset, preprocess it, and save the preprocessed data files locally.
By default, files are saved under the folder tweet_dataset
, but you can change that behavior with the --dataset
argument. For example:
fleksychallenge prepare --dataset ../my/folder
Train
Once the dataset is ready, you can start training the model with :
fleksychallenge train
It will train the model and save it under sentiment_model
by default.
By default the model is trained on GPU. If you would like to train on CPU instead, you can specify the --cpu
argument :
fleksychallenge train --cpu
You can change where to save the model by specifying the --model
argument. For example :
fleksychallenge train --model my_model
If you preprocessed your dataset in a different folder, you must specify the location with the --dataset
argument (similarly to the prepare
command):
fleksychallenge train --dataset ../my/folder
A default configuration file is provided for training. You can also generate your own configuration file for training. To do this, head over to Spacy documentation and copy-paste the generated config in a file called base_config.cfg
.
Then, run :
python -m spacy init fill-config ./base_config.cfg ./config.cfg
It will save the full config file at config.cfg
.
Once your config file is generated, you can launch the training with :
fleksychallenge train --config config.cfg
Test
After training your model, you should test it ! You can do that with :
fleksychallenge test
It will load your trained model and compute several metrics (accuracy, precision, recall, F-1 score).
If you have to pick a single metric for comparing different models, you should pick Recall (as advised in the original paper of TweetEval)
As before, you can specify a different dataset to use for testing with the --dataset
argument, or a different model to load with the --model
argument.
Also, the test set of TweetEval is quite big (+12k samples), so by default the testing script will only evaluate the model on the first 100 samples. You can change this behavior by specifying the --full
argument :
fleksychallenge test --full
Contribute
To contribute, install the package locally, create your own branch, add your code, and open a PR !
Pre-commit hooks
Pre-commit hooks are set to check the code added whenever you commit something.
If you never ran the hooks before, install it with :
pre-commit install
Then you can just try to commit your code. If you code does not meet the quality required by linters, it will not be committed. You can just fix your code and try to commit again !
You can manually run the pre-commit hooks with :
pre-commit run --all-files
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file fleksychallenge-1.0.0.tar.gz
.
File metadata
- Download URL: fleksychallenge-1.0.0.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c18229051549e95c78c1302beef20ee7ffd57a14132d04ed12f040eccd0ae8d9 |
|
MD5 | 408d993fe2f9c4486a41f405b479beb1 |
|
BLAKE2b-256 | d65adcddd4276085754b73c0d58913718c1939f84aacb2f3b062400da1ef4151 |
File details
Details for the file fleksychallenge-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: fleksychallenge-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4c5cd100a3b21449f0fca64fb66f7078da449cdbdfe916c4da666e859addddb8 |
|
MD5 | 932c0b933ff01ddc3c265b7c27547ee1 |
|
BLAKE2b-256 | d38c02a9787507682f40cfbfdd79f54e8541e340f43e37bfd4c62410f542c655 |