Skip to main content

Code to run the Extra algorithm for unsupervised topic extraction.

Project description

codecov Code style: black Imports: isort GitHub license PyPI

extra-model

Code to run the Extra algorithm for the unsupervised topic/aspect extraction on English texts.

Read the Official Documentation here

Quick start

IMPORTANT:

  1. When running Extra inside docker-container, make sure that Docker process has enough resources. For example, on Mac/Windows it should have at least 8 Gb of RAM available to it. Read More about RAM Requirements
  2. GitHub repo does not come with Glove Embeddings. See section Downloading Embeddings for how to download the required embeddings.

Using docker-compose

This is a preferred way to run extra-model. You can find instructions on how to run extra-model using CLI or as a Python package here

First, build the image:

docker-compose build

Then, run following command to make sure that extra-model was installed correctly:

docker-compose run test

Downloading Embeddings

Next step is to download the embeddings (we use Glove from Stanford in this project).

To download the required embeddings, run the following command:

docker-compose run --rm setup

The embeddings will be downloaded, unzipped and formatted into a space-efficient format. Files will be saved in the embeddings/ directory in the root of the project directory. If the process fails, it can be safely restarted. If you want to restart the process with new files, delete all files except README.md in the embeddings/ directory.

[Optional] Run docker-compose build again

After you've downloaded the embeddings, you may want to run docker-compose build again. This will build an image with embeddings already present inside the image.

The tradeoff here is that the image will be much bigger, but you won't spend ~2 minutes each time you run extra-model waiting for embeddings to be mounted into the container. On the other hand, building an image with embeddings in the context will increase build time from ~3 minutes to ~10 minutes.

Run extra-model

Finally, running extra-model is as simple as:

docker-compose run extra-model /package/tests/resources/100_comments.csv

NOTE: when using this approach, input file should be mounted inside the container. By default, everything from extra-model folder will be mounted to /package/ folder. This can be changed in docker-compose.yaml

This will produce a result.csv file in /io/ (default setting) folder.

Learn more

Our official documentation is the best place to continue learning about extra-model:

  1. Explanation of inputs/outputs
  2. Step-by-step workflow of what happens inside of extra-model
  3. Examples of how extra-model can be used in downstream applications
  4. Detailed explanation of how to run extra-model using different interfaces (via docker-compose, via CLI, as a Python package).

Authors

extra-model was written by mbalyasin@wayfair.com, mmozer@wayfair.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

extra-model-0.2.1.tar.gz (33.4 kB view details)

Uploaded Source

Built Distribution

extra_model-0.2.1-py3-none-any.whl (39.0 kB view details)

Uploaded Python 3

File details

Details for the file extra-model-0.2.1.tar.gz.

File metadata

  • Download URL: extra-model-0.2.1.tar.gz
  • Upload date:
  • Size: 33.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for extra-model-0.2.1.tar.gz
Algorithm Hash digest
SHA256 ea0550480a5e712e29ffbdcfa5e1a2a0a9db5ef2cb5bfb2369923990a56c10b9
MD5 9e274058e35f2062e2f36d41479fde6b
BLAKE2b-256 f838d61ea463bbef06339b3fdeb28c00a7d36bda0815dc53521a58be515afbee

See more details on using hashes here.

File details

Details for the file extra_model-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: extra_model-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 39.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5

File hashes

Hashes for extra_model-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e4faba83dd10f28f9c30c7515efb7696cc5b03c97035b8c457cec715d31130b5
MD5 e1e1366b9b3ffc95cf4795894f4369db
BLAKE2b-256 7609fe765844cbd7906a4ecd30877ab8015167d25f31c1d67cde8439ba8fcdbf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page