Skip to main content

NLP Feature Extractors

Project description

Project

The Error Analysis NLP project enables the Error Analysis Tool to be applied to Natural Language Processing (NLP) tasks on unstructured text.

Setup

The project was developed and tested on Linux (Ubuntu 18.04).

  1. Create a virtual environment python3 -m venv .venv.
  2. Activate the environment source .venv/bin/activate
  3. Upgrade pip pip install pip --upgrade.
  4. From the project root run pip install -e .
  5. Download a language pack for Spacy python -m spacy download en_core_web_sm
  6. Set the environment variable AZURE_STORAGE_CONNECTION_STRING to the connection string for the erroranalysisnlp Azure Storage account.
  7. To run the notebooks make sure that (3) has been set. Then start a Jupyter Notebook server from within the notebooks folder.

Notes

  1. The Amulet scripts were used to run inference on GCR.
  2. The feature extrction was performed at the same time, and the results of the feature extraction and inference were stored as a single JSON blob.
  3. The datasets created in (2) were large enough that we had to opt to store them in Azure Blob Storage instead of checking them into the repository.
  4. The notebooks that demonstrate the application of the Error Analysis tool to NLP tasks, load these datasets directly from Azure Blob Storage.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Installation

  • Setup a Python virtual environment python3 -m venv nlp-venv
  • Activate the environment source nlp-venv/bin/activate
  • Clone the repo. This should clone the repo to a folder error-analysis-nlp
  • From within the error-analysis-nlp folder do pip install -e .
  • You may optionally specify the pip cache folder above by using the attribute --cache-dir <path-to-pip-cache>
  • Once setup change into the folder error-analysis-nlp/notebooks and run jupyter-notebook
  • Optionally, you may set the environment variables that specify the Huggingface caches for models and datasets, respectively TRANSFORMERS_CACHE and HF_DATASETS_CACHE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlp_feature_extractors-0.0.2.tar.gz (5.2 kB view details)

Uploaded Source

Built Distribution

nlp_feature_extractors-0.0.2-py3-none-any.whl (51.3 kB view details)

Uploaded Python 3

File details

Details for the file nlp_feature_extractors-0.0.2.tar.gz.

File metadata

  • Download URL: nlp_feature_extractors-0.0.2.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.49.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.0

File hashes

Hashes for nlp_feature_extractors-0.0.2.tar.gz
Algorithm Hash digest
SHA256 a44bb3c4d92d00ddb43cc453edbc4535b4472777d5543014973cbc609fad84f0
MD5 246e0d1a632a8f7afc27fd2da6484351
BLAKE2b-256 8abd04e5150e99d725c4de2f5e8889ed1cbb4f6fb2e7a05288f6eb04461ea272

See more details on using hashes here.

File details

Details for the file nlp_feature_extractors-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: nlp_feature_extractors-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 51.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.49.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.0

File hashes

Hashes for nlp_feature_extractors-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ebcbeadad2405092c19c9ae86c0ec5af73729f1532f9600bb4014999939f1c97
MD5 89bcaaf4de6df72cd189faf6ee5bf698
BLAKE2b-256 65b5f1a4f52a8af7d0cb76271f28df938a229f21b1ec9833e73c5e6e54ad91be

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page