Data loaders and abstractions for text and NLP
Project description
LanguageFlow
Data loaders and abstractions for text and NLP
Requirements
Install dependencies
$ pip install future, tox
$ pip install python-crfsuite==0.9.5
$ pip install Cython
$ pip install -U fasttext --no-cache-dir --no-deps --force-reinstall
$ pip install xgboost==0.82
Installation
$ pip install languageflow
Components
Transformers: NumberRemover, CountVectorizer, TfidfVectorizer
Models: SGDClassifier, XGBoostClassifier, KimCNNClassifier, FastTextClassifier, CRF
Data
Download a dataset using download command
$ languageflow download DATASET
List all dataset
$ languageflow list
Datasets
The datasets module currently contains:
Tagged: VLSP2018-NER, VTB-CHUNK*, VLSP2016-NER*, VLSP2013-POS*, VLSP2013-WTK*
Categorized: AIVIVN2019_SA*, VLSP2018_SA*, UTS2017_BANK, VLSP2016_SA*, VNTC
Plaintext: VNESES, VNTQ_SMALL, VNTQ_BIG
Caution (*): With closed license dataset, you must provide URL to download
Example
Download UTS2017_BANK dataset
$ languageflow download UTS2017_BANK
Use UTS2017_BANK dataset
>>> from languageflow.data_fetcher import DataFetcher, NLPData
>>> corpus = DataFetcher.load_corpus(NLPData.UTS2017_BANK_SA)
>>> print(corpus)
CategorizedCorpus: 1780 train + 197 dev + 494 test sentences
History
1.1.7 (2018-04-12)
Automatic deploy with travis and pypi
Fix dependencies hell
1.1.6 (2017-12-26)
Add data module to handle data downloading and data preprocessing
Add many new models: SGDClassifier, XGBoostClassier, FastTextClassifier, CRF
Add new feature: LanguageBoard
Automatic continuous integration with travis-ci
Build docs with readthedocs.org
1.1.5 (2017-12-11)
Refactor project to integrate with underthesea experiment
0.1.0 (2017-09-18)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for languageflow-1.1.13-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc7d978d1c9711a650f0df74bd20b7a50d86faa9e2c00ac938952917ee2041ad |
|
MD5 | e1c6f34bc38ac27f627c54a08995f001 |
|
BLAKE2b-256 | abc0905a59c133936f45684b8d3e76119ffbc92787bb320d3dba9e4d7da12d8a |