Counting stop words.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Software Development :: Libraries :: Python Modules

Project description

Common Phrase Detection

This is an API python library which is developed for detecting stop phrases.

Background
Install
API
Maintainers

Background

NLP (Natural Language Processing) techniques is very helpful in various applications such as sentiment analysis, chatbots and other areas. For developing NLP models a need for a large & clean corpus for learning words relations is indisputable. One of the challanges in achieving a clean corpus is stop phrases. Stop phrases usually does not contain much information about the text and so must be identified and removed from the text.
This is the aim of this repo to provide a structure for processing HTML pages (which are a valuable source of text for all languages) and finding a certain number of possible combinations of words and using human input for identifying stop phrases.

Install

Make sure you have docker,docker-compose and python 3.8 and above installed.
create a .env file with desired values based on .env.example file.
After cloning the project, go to the project directory and run below command.

docker-compose -f docker-compose-dev.yml build

After the images are built successfully, run below command for starting the project.

docker-compose -f docker-compose-dev.yml up -d

We need to create a database and collection in mongo in order to use the API. First run mongo bash.

docker exec -it db bash

Authenticate in mongo container.

mongo -u ${MONGO_INITDB_ROOT_USERNAME} -p ${MONGO_INITDB_ROOT_PASSWORD} -- authenticationDatabase admin

Create the database and collection based on MONGO_PHRASE_DB and MONGO_PHRASE_COL names you provided in step 2.

use phrasedb;  # Database creation
db.createCollection("common_phrase");  # Collection creation

Now you're ready yo use the API section.

API

This API has three endpoints.

Document Process

Here you can pass a HTML text in request body to process it.

The process stages are:

Fetching all H1-H6 and p tags
Cleaning text
Finding bags (from 1 to 5 bags of word)
Counting the number of occurences in text
Integrating results in database (Updating count field of the phrase if already exists, otherwise inserting a new record)

Status Updater

Updates statuses.

Changing the status of a phrase to either stop or highlight.

Data Fetcher

Fetching data from database based on the statuses. Here you can fetch phrases based on 4 different situation for statuses:

Stop phrases
Highlight phrases
Phrases that have status (either stop or highlight)
Phrases which statuses are not yet determined

API details

API Base URL

127.0.0.1:8000

API Swagger UI

127.0.0.1:8000/docs

For futher details and how to make request to each endpoint refer to the swagger of the API.

Maintainers

Maani Beygi
Reza Shabrang

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

This version

0.2.6

Jun 11, 2022

0.2.4

Jun 7, 2022

0.2.3

Jun 6, 2022

0.2.2

Jun 6, 2022

0.2.1

Jun 6, 2022

0.2.0

May 31, 2022

0.1.8

May 30, 2022

0.1.7

May 30, 2022

0.1.6

May 25, 2022

0.1.5

May 16, 2022

0.1.4

May 11, 2022

0.1.3

Apr 20, 2022

0.1.2

Apr 17, 2022

0.1.1

Mar 28, 2022

0.1.0

Mar 27, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phrase_counter-0.2.6.tar.gz (19.4 kB view hashes)

Uploaded Jun 11, 2022 Source

Built Distribution

phrase_counter-0.2.6-py3-none-any.whl (16.3 kB view hashes)

Uploaded Jun 11, 2022 Python 3

Hashes for phrase_counter-0.2.6.tar.gz

Hashes for phrase_counter-0.2.6.tar.gz
Algorithm	Hash digest
SHA256	`4be023e51a985d3379c07ccba02b9b85eec8e5efcf9bc5531de878c16ea4b814`
MD5	`9b2b501e8e56d3148a84db79a3b78fed`
BLAKE2b-256	`231c249830e689ab1188a153b40936676c980c95be931e7fa46b1a7d776d83a5`

Hashes for phrase_counter-0.2.6-py3-none-any.whl

Hashes for phrase_counter-0.2.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f5a869dd60d96bdc8fb303b412564b96f970a89077052f5c9238f260b4c13bbe`
MD5	`169ea34d39cdac66c45176bab5483ee6`
BLAKE2b-256	`4f5855e094310ac711acb343f98f7250427338b9097c1fd86b6c42f97bceec67`