Skip to main content

A Python package for common Natural Language Processing tasks

Project description

NLProv: Natural Language Processing Tool

NLProv is a Python library developed by Johnson & Johnson's Advanced Analytics team that combines existing libraries for common Natural Language Processing tasks. It combines several existing open-source libraries such as pandas, spaCy, and scikit-learn to make a pipeline that is ready to process text data. There are many user defined parameters depending on your type of project such as the ability to choose stemming or lemmatization. Or, you might want to define explicitly what to substitute with NaN text fields. Overall, it is a way to get you started in your NLP task, no matter what you need.

A tutorial on how to use this package can be found here.

Installation Instructions

  • Using pip:
    pip install nlprov
    
  • For more information on installing packages using pip, click here.

Contributing

  • To help develop this package, you'll need to install a conda virtual environment defined by our dev_environment.yml file using the below command.

    conda env create -f dev_environment.yml
    
    • Then, just activate the environment when attempting to develop or run tests using the below command.

      conda activate nlp_env
      
    • When you're all done developing or testing, just deactivate the environment with the below command.

      conda deactivate
      

Docker Configuration

  • This codebase is dockerized to build, run all of the unit tests using pytest, and perform pip packaging.
    • In order to run the docker container, ensure you have Docker installed and running on your local machine.
    • To start the docker container locally, simply navigate to the root of the project directory and type:
    docker-compose up --build
    
    • Note: docker-compose is included in the Docker desktop installation link above for MacOS and Windows based systems. If you have issues executing docker-compose, Navigate Here to ensure docker-compose is supported on your system.
    • A Notey-er note: You can use docker-compose up --build during development to quickly run the tests after code changes without setting up/running a local conda environment.

GitHub Action CI Configuration

  • Every commit to this repository will trigger a build in GitHub Actions following the .github/workflows/pythonapp.yml located in the root of this project.
    • GitHub Actions is used to build and lint the NLProv package, run the tests, and perform pip packaging.
    • If the environment name or version changes, the pythonapp.yml file will need to be updated to follow the new pattern.

Our Workflow

Upcoming Features

Here is a roadmap of features to be implemented in this package. If you have any ideas for additional features, please let us know!

  • Preprocessing
    • Ability to use custom stop words
    • Incorporation of bi-grams
    • Ability for user to chose which langauge detection package to use
  • Vectorization
    • spaCy pre-trained models
    • spaCy custom models
  • Similarity Metrics
    • Additional pairwise distances
    • Levenshtein Distance
    • Word Mover's Distance
  • Visualizations
    • TF-IDF
    • Jaccard

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlprov-1.0.0.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nlprov-1.0.0-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file nlprov-1.0.0.tar.gz.

File metadata

  • Download URL: nlprov-1.0.0.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.5

File hashes

Hashes for nlprov-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9667f211d4c4858abb681642c00acf3f74924cbc0ba36c3074db6e543eb6ca48
MD5 a353997b915a87a190b6391eaab69076
BLAKE2b-256 1447cd00d406fbec5d0c5b9af8c0ccf060f6cd9321e724e3a3e11aee8dda151a

See more details on using hashes here.

File details

Details for the file nlprov-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: nlprov-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.5

File hashes

Hashes for nlprov-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f616c34c666eb296e509676816843f3ff6c667fdb8de2dea9566c30cba1028ab
MD5 09fdd2d106acbd7ba54f0cfac24acf60
BLAKE2b-256 0a6a11606dd004bcb666786fef5942554f800cbabdeb4d12d850013e78eb330c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page