Skip to main content

Takes a list of documents and returns fully automated & labeled dictionaries where topic names are keys and semantically similar keywords from the documents as values

Project description

docs2tops stands for documents to topics.

What it basically does is:

  • extracting ngrams from the documents
  • extracting meaningful moregrams (2 or more grams)
  • creates semi-automated dictionary - if user provided some possible topics, docs2tops provides similar keywords per topics provided
  • creates fully-automated dictionary

in both cases (either user inputs some topics or not), docs2tops returns 2 dictionaries. if user did not provide any topic, first dictionary will be empty with a message only.

in all cases, fully-automated dictionary will be created.

docs2tops function takes list of documents optionally, you can provide candidate_topics_list, moregrams_sample_size.

docs2tops(docs_input_list, candidate_topics_list=None, moregrams_sample_size=None)

installation

Run the following to install:

pip install docs2tops

usage

from docs2tops import docs2tops
import pandas as pd

df = pd.read_csv(r"C:\Users\my_file.csv")
docs = df['my_texual_content'].to_list()

candidate_topics_list = ['smell', 'taste', 'delivery', 'packaging']
moregrams_sample_size = 100


user_input_dict, fully_auto_dict = docs2tops(docs_input_list=docs,
              candidate_topics_list=candidate_topics_list, 
              moregrams_sample_size=moregrams_sample_size)

list_dicts = [user_input_dict, fully_auto_dict]
for result in list_dicts:
    print(result)
    print('number of topics: ', len(result))
    print('---')

Developing docs2tops

to install docs2tops, along with the tools you need to develop and run tests, run the following in your virtual environment:

pip install -e .[dev]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docs2tops-0.0.3.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

docs2tops-0.0.3-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file docs2tops-0.0.3.tar.gz.

File metadata

  • Download URL: docs2tops-0.0.3.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for docs2tops-0.0.3.tar.gz
Algorithm Hash digest
SHA256 ba00b06212541bcb6133564f23a88dfc1ef0413bf51fa174bf7e0f40a26ca845
MD5 7f7f203d8406c23e88f08283fae74c2b
BLAKE2b-256 cd3d57360316caa6a40158119b6016067a108064f8ccbf373b007c652bbf9ab3

See more details on using hashes here.

File details

Details for the file docs2tops-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: docs2tops-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for docs2tops-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9dd0cffd033b3eef827e804e7d14dc6028e39481626805b1113ac90ccc030860
MD5 ef63fae8c3ac478fb41b328417b8d4d6
BLAKE2b-256 830239eb321942c75b13c94fe0efd6f4778c50b983a6e74a89ec5ba7edf640dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page