Takes a list of documents and returns fully automated & labeled dictionaries where topic names are keys and semantically similar keywords from the documents as values
Project description
docs2tops stands for documents to topics.
What it basically does is:
- extracting ngrams from the documents
- extracting meaningful moregrams (2 or more grams)
- creates semi-automated dictionary - if user provided some possible topics, docs2tops provides similar keywords per topics provided
- creates fully-automated dictionary
in both cases (either user inputs some topics or not), docs2tops returns 2 dictionaries. if user did not provide any topic, first dictionary will be empty with a message only.
in all cases, fully-automated dictionary will be created.
docs2tops function takes list of documents optionally, you can provide candidate_topics_list, moregrams_sample_size.
docs2tops(docs_input_list, candidate_topics_list=None, moregrams_sample_size=None)
installation
Run the following to install:
pip install docs2tops
usage
from docs2tops import docs2tops
import pandas as pd
df = pd.read_csv(r"C:\Users\my_file.csv")
docs = df['my_texual_content'].to_list()
candidate_topics_list = ['smell', 'taste', 'delivery', 'packaging']
moregrams_sample_size = 100
user_input_dict, fully_auto_dict = docs2tops(docs_input_list=docs,
candidate_topics_list=candidate_topics_list,
moregrams_sample_size=moregrams_sample_size)
list_dicts = [user_input_dict, fully_auto_dict]
for result in list_dicts:
print(result)
print('number of topics: ', len(result))
print('---')
Developing docs2tops
to install docs2tops, along with the tools you need to develop and run tests, run the following in your virtual environment:
pip install -e .[dev]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file docs2tops-0.0.3.tar.gz.
File metadata
- Download URL: docs2tops-0.0.3.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba00b06212541bcb6133564f23a88dfc1ef0413bf51fa174bf7e0f40a26ca845
|
|
| MD5 |
7f7f203d8406c23e88f08283fae74c2b
|
|
| BLAKE2b-256 |
cd3d57360316caa6a40158119b6016067a108064f8ccbf373b007c652bbf9ab3
|
File details
Details for the file docs2tops-0.0.3-py3-none-any.whl.
File metadata
- Download URL: docs2tops-0.0.3-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9dd0cffd033b3eef827e804e7d14dc6028e39481626805b1113ac90ccc030860
|
|
| MD5 |
ef63fae8c3ac478fb41b328417b8d4d6
|
|
| BLAKE2b-256 |
830239eb321942c75b13c94fe0efd6f4778c50b983a6e74a89ec5ba7edf640dc
|