Code to run the Extra algorithm for unsupervised topic extraction.
Project description
extra-model
Code to run the Extra algorithm for the unsupervised topic/aspect extraction on English texts.
Read the Official Documentation here
Quick start
IMPORTANT:
- When running Extra inside docker-container, make sure that Docker process has enough resources. For example, on Mac/Windows it should have at least 8 Gb of RAM available to it. Read More about RAM Requirements
- GitHub repo does not come with Glove Embeddings. See section
Downloading Embeddings
for how to download the required embeddings.
Using docker-compose
This is a preferred way to run extra-model
.
You can find instructions on how to run extra-model
using CLI or as a Python package here
First, build the image:
docker-compose build
Then, run following command to make sure that extra-model
was installed correctly:
docker-compose run test
Downloading Embeddings
Next step is to download the embeddings (we use Glove from Stanford in this project).
To download the required embeddings, run the following command:
docker-compose run --rm setup
The embeddings will be downloaded, unzipped and formatted into a space-efficient format. Files will be saved in the embeddings/
directory in the root of the project directory. If the process fails, it can be safely restarted. If you want to restart the process with new files, delete all files except README.md
in the embeddings/
directory.
[Optional] Run docker-compose build
again
After you've downloaded the embeddings, you may want to run docker-compose build
again.
This will build an image with embeddings already present inside the image.
The tradeoff here is that the image will be much bigger, but you won't spend ~2 minutes each time you run extra-model
waiting for embeddings to be mounted into the container.
On the other hand, building an image with embeddings in the context will increase build time from ~3 minutes to ~10 minutes.
Run extra-model
Finally, running extra-model
is as simple as:
docker-compose run extra-model /package/tests/resources/100_comments.csv
NOTE: when using this approach, input file should be mounted inside the container.
By default, everything from extra-model
folder will be mounted to /package/
folder.
This can be changed in docker-compose.yaml
This will produce a result.csv
file in /io/
(default setting) folder.
Learn more
Our official documentation is the best place to continue learning about extra-model
:
- Explanation of inputs/outputs
- Step-by-step workflow of what happens inside of
extra-model
- Examples of how
extra-model
can be used in downstream applications - Detailed explanation of how to run
extra-model
using different interfaces (viadocker-compose
, via CLI, as a Python package).
Authors
extra-model
was written by mbalyasin@wayfair.com
, mmozer@wayfair.com
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file extra-model-0.2.1.tar.gz
.
File metadata
- Download URL: extra-model-0.2.1.tar.gz
- Upload date:
- Size: 33.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea0550480a5e712e29ffbdcfa5e1a2a0a9db5ef2cb5bfb2369923990a56c10b9 |
|
MD5 | 9e274058e35f2062e2f36d41479fde6b |
|
BLAKE2b-256 | f838d61ea463bbef06339b3fdeb28c00a7d36bda0815dc53521a58be515afbee |
File details
Details for the file extra_model-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: extra_model-0.2.1-py3-none-any.whl
- Upload date:
- Size: 39.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4faba83dd10f28f9c30c7515efb7696cc5b03c97035b8c457cec715d31130b5 |
|
MD5 | e1e1366b9b3ffc95cf4795894f4369db |
|
BLAKE2b-256 | 7609fe765844cbd7906a4ecd30877ab8015167d25f31c1d67cde8439ba8fcdbf |