Skip to main content

Natural Language Procecssing Toolkit with support for tokenization, sentence splitting, lemmatization, tagging and parsing for more than 60 languages

Project description

NLP-Cube

NLP-Cube is an opensource Natural Language Processing Framework with support for languages which are included in the UD Treebanks.

Follow the Quick Start Tutorial to get things running in no time.

Advanced users that want to create their own models, will have to use the installation tutorial (below).

Simple (PIP) installation

If you just want to use NLP-Cube, just use the available PIP package:

pip3 install nlpcube

Usage

To use NLP-Cube programmatically (in Python), follow this tutorial

To use NLP-Cube as a web service, you need to clone this repo, install requirements and start the server:

git clone https://github.com/adobe/NLP-Cube.git
cd NLP-Cube
pip3 install -r requirements.txt

The following command will start the server and preload languages: en, fr and de.

cd cube
python3 webserver.py --port 8080 --lang=en --lang=fr --lang=de

To test, open the following this link

Manual Installation (if you want to train new models)

Cloning NLP-Cube

In order to create new models you need to start by cloning this repo and installing requirements.

Clone

git clone https://github.com/adobe/NLP-Cube.git
cd NLP-Cube
pip3 install -r requirements.txt

NLP-Cube is dependent on DyNET. In order to train your own models you should do a custom DyNET installation with MKL and/or CUDA support.

Installing DyNet:

  1. Make sure you have Mercurial, python, pip, cmake installed (you can also check steps documented here)
  2. [Hard mode] Install Intel's MKL library. Download appropriate version for your OS and follow the install script provided in the archive. MKL is a optimized math library that DyNet can use to significantly speed up training and runtime performance.

OR

  1. [Easy mode] If you run a debian (should work on other *nix systems), run the following commands to automatically setup MKL:
sudo wget https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB 
sudo apt-key add GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB
sudo wget https://apt.repos.intel.com/setup/intelproducts.list -O /etc/apt/sources.list.d/intelproducts.list
sudo apt-get update 
sudo apt-get install -y intel-mkl-64bit-2018.2-046

OR

  1. [Don't really care about speed mode] Do not install MKL at all. This will slow down DyNet by ~2.5 times but it will work just as well. Don't forget to run cmake in step 3. without the "-DMKL_ROOT=/opt/intel/mkl" flag in this case.

  2. Install DyNet by using the installation steps from the manual installation page. More specifically, you should use:

    pip3 install cython
    mkdir dynet-base
    cd dynet-base
    
    git clone https://github.com/clab/dynet.git
    hg clone https://bitbucket.org/eigen/eigen -r 2355b22  # -r NUM specified a known working revision
    
    cd dynet
    mkdir build
    cd build
    cmake .. -DEIGEN3_INCLUDE_DIR=../../eigen -DMKL_ROOT=/opt/intel/mkl -DPYTHON=`which python3`
    
    make -j 2 # replace 2 with the number of available cores
    make install
    
    cd python
    python3 ../../setup.py build --build-dir=.. --skip-build install
    

Note: sometimes cmake fails. If it does, delete the contents of the build folder and give the -DEIGEN3_INCLUDE_DIR flag the absolute path to eigen (dont use ../ or other relative paths). Also, check cmake is updated to the latest version available.

Training

Training models is easy. Just use --help command line to get available command. Depending on what model you want to train, you must set the appropiate value for the --train parameter. For example, if you want to train the lemmatizer, you need to use the following command (provided that you have downloaded the training data and placed it in the corpus folder:

python3 cube/main.py --train=lemmatizer --train-file=corpus/ud_treebanks/UD_Romanian/ro-ud-train.conllu --dev-file=corpus/ud_treebanks/UD_Romanian/ro-ud-dev.conllu --embeddings=corpus/wiki.ro.vec --store=corpus/trained_models/ro/lemma/lemma --test-file=corpus/ud_test/gold/conll17-ud-test-2017-05-09/ro.conllu --batch-size=1000

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlpcube-0.1.0.1.tar.gz (69.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlpcube-0.1.0.1-py3-none-any.whl (91.0 kB view details)

Uploaded Python 3

File details

Details for the file nlpcube-0.1.0.1.tar.gz.

File metadata

  • Download URL: nlpcube-0.1.0.1.tar.gz
  • Upload date:
  • Size: 69.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.19.8 CPython/2.7.10

File hashes

Hashes for nlpcube-0.1.0.1.tar.gz
Algorithm Hash digest
SHA256 1b57bd871f9bb5bc7a8a71a2bd21de6e7ecbf91bd57e94ca4450ffc207810826
MD5 680e6d86c4e78f706599bfad624b5960
BLAKE2b-256 6f9babbb6c9b5380240180c4f29f8d87cf2aff7920fca6ab36c052a55a5d1ea7

See more details on using hashes here.

File details

Details for the file nlpcube-0.1.0.1-py3-none-any.whl.

File metadata

  • Download URL: nlpcube-0.1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 91.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.19.8 CPython/2.7.10

File hashes

Hashes for nlpcube-0.1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 88cdd6ae20efd3e1f7e51e4dee9d80e39dda1352dd0548fe006ae32d34af0f70
MD5 3c0f966c2511d05dcc792aa256e9d846
BLAKE2b-256 482cb563ff764d13642607d6edd0133b65fd539f64df375dd760ed1d7eb73ead

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page