NLP Feature Extractors
Project description
Project
The Error Analysis NLP project enables the Error Analysis Tool to be applied to Natural Language Processing (NLP) tasks on unstructured text.
Setup
The project was developed and tested on Linux (Ubuntu 18.04).
- Create a virtual environment
python3 -m venv .venv
. - Activate the environment
source .venv/bin/activate
- Upgrade pip
pip install pip --upgrade
. - From the project root run
pip install -e .
- Download a language pack for Spacy
python -m spacy download en_core_web_sm
- Set the environment variable
AZURE_STORAGE_CONNECTION_STRING
to the connection string for theerroranalysisnlp
Azure Storage account. - To run the notebooks make sure that (3) has been set. Then start a Jupyter Notebook server from within the
notebooks
folder.
Notes
- The Amulet scripts were used to run inference on GCR.
- The feature extrction was performed at the same time, and the results of the feature extraction and inference were stored as a single JSON blob.
- The datasets created in (2) were large enough that we had to opt to store them in Azure Blob Storage instead of checking them into the repository.
- The notebooks that demonstrate the application of the Error Analysis tool to NLP tasks, load these datasets directly from Azure Blob Storage.
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
Installation
- Setup a Python virtual environment
python3 -m venv nlp-venv
- Activate the environment
source nlp-venv/bin/activate
- Clone the repo. This should clone the repo to a folder
error-analysis-nlp
- From within the
error-analysis-nlp
folder dopip install -e .
- You may optionally specify the pip cache folder above by using the attribute
--cache-dir <path-to-pip-cache>
- Once setup change into the folder
error-analysis-nlp/notebooks
and runjupyter-notebook
- Optionally, you may set the environment variables that specify the Huggingface caches for models and datasets, respectively
TRANSFORMERS_CACHE
andHF_DATASETS_CACHE
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nlp_feature_extractors-0.0.2.tar.gz
.
File metadata
- Download URL: nlp_feature_extractors-0.0.2.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.49.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a44bb3c4d92d00ddb43cc453edbc4535b4472777d5543014973cbc609fad84f0 |
|
MD5 | 246e0d1a632a8f7afc27fd2da6484351 |
|
BLAKE2b-256 | 8abd04e5150e99d725c4de2f5e8889ed1cbb4f6fb2e7a05288f6eb04461ea272 |
File details
Details for the file nlp_feature_extractors-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: nlp_feature_extractors-0.0.2-py3-none-any.whl
- Upload date:
- Size: 51.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.49.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebcbeadad2405092c19c9ae86c0ec5af73729f1532f9600bb4014999939f1c97 |
|
MD5 | 89bcaaf4de6df72cd189faf6ee5bf698 |
|
BLAKE2b-256 | 65b5f1a4f52a8af7d0cb76271f28df938a229f21b1ec9833e73c5e6e54ad91be |