Skip to main content

A multifaceted tool to simplify NLP.

Project description

Rozha

A package to simplify and streamline a number of natural language processing processes and methods for a wide variety of languages, empowering users to use NLP on both non-English and English texts.

Rozha is named after Rozhanitsa, a goddess from Slavic mythology.

Installation

Install using pip:

pip3 install rozha

Or download the GitHub repo and the install the requirements:

pip3 install -r requirements.txt

Then begin using the package by importing the modules you plan to use. Rozha is structured into three classes: process, analyze, and visualize. If running from a local copy of the files, use the following:

from process import process
from analyze import analyze
from visualize import visualize

If you installed using pip, use this syntax:

import rozha.process as process (or whatever name you choose)
import rozha.analyze as analyze (or whatever name you choose)
import rozha.visualize as visualize (or whatever name you choose)

Full Documentation

A full list of the package's functions can be viewed at this link.

Example Pipelines

Some example pipelines for working with the package are as follows: Open a file, perform word tokenization and remove stopwords, make the text lowercase, and then get part-of-speech tags for the text:

import rozha.process as process
import rozha.analyze as analyze

word_tokenized = process.lowerFile("your_file.txt")
pos_tags = analyze.posList(word_tokenized)
print(pos_tags)

Open a file, perform sentence tokenization without removing stopwords, and then perform named entity recognition on each sentence using spaCy:

import rozha.process as process
import rozha.analyze as analyze

sent_tokenized = process.sentTokenizeFile(your_file.txt)
ner_tags = analyze.spacyNer(sent_tokenized, 'en')
print(ner_tags)

Perform word tokenization and remove stopwords from a string, make the text lowercase, and graph the 10 most common words as a bar graph:

import rozha.process as process
import rozha.visualize as visualize

word_tokenized = process.lowerVar(text)
# pass the var, number of words to graph, the height and width of the graph, and your preferred filename
visualize.barFreq(word_tokenized, 10, 400, 400, 'my_graph')

Contributing

Contributions are welcome! The following features are of particular interest:

  • Increasing the number of methods in the analyze class
  • Increasing the number of methods in the visualization class

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rozha-1.0.0.tar.gz (41.5 kB view details)

Uploaded Source

Built Distribution

rozha-1.0.0-py3-none-any.whl (35.8 kB view details)

Uploaded Python 3

File details

Details for the file rozha-1.0.0.tar.gz.

File metadata

  • Download URL: rozha-1.0.0.tar.gz
  • Upload date:
  • Size: 41.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.5

File hashes

Hashes for rozha-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1cdf81f5959ae11e9c1f2ecd5f51c55ca562311272b4b57e81cbaa04deda64c0
MD5 e7a43ef1b91bdc7edea3ce7303b2f07c
BLAKE2b-256 0f27aa84f603dc7f400a451c9e0c3193db8e7a32c016d1bfaa5c3fd8fd569e78

See more details on using hashes here.

File details

Details for the file rozha-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: rozha-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 35.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.5

File hashes

Hashes for rozha-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e876c90f4ca2ee03da7da60b669ecb71748b0c96ba21fe535c9f36690581ca59
MD5 cc9a457c899e2dd0180210d590c85fab
BLAKE2b-256 42df3888cd7ec892c478079e3c1764ddd5bda4c48ad4f47699d6ded897194b35

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page