A multifaceted tool to simplify NLP.
Project description
Rozha
A package to simplify and streamline a number of natural language processing processes and methods for a wide variety of languages, empowering users to use NLP on both non-English and English texts.
Rozha is named after Rozhanitsa, a goddess from Slavic mythology.
Installation
Install using pip:
pip3 install rozha
Or download the GitHub repo and the install the requirements:
pip3 install -r requirements.txt
Then begin using the package by importing the modules you plan to use. Rozha is structured into three classes: process, analyze, and visualize. If running from a local copy of the files, use the following:
from process import process
from analyze import analyze
from visualize import visualize
If you installed using pip, use this syntax:
import rozha.process as process (or whatever name you choose)
import rozha.analyze as analyze (or whatever name you choose)
import rozha.visualize as visualize (or whatever name you choose)
Full Documentation
A full list of the package's functions can be viewed at this link.
Example Pipelines
Some example pipelines for working with the package are as follows: Open a file, perform word tokenization and remove stopwords, make the text lowercase, and then get part-of-speech tags for the text:
import rozha.process as process
import rozha.analyze as analyze
word_tokenized = process.lowerFile("your_file.txt")
pos_tags = analyze.posList(word_tokenized)
print(pos_tags)
Open a file, perform sentence tokenization without removing stopwords, and then perform named entity recognition on each sentence using spaCy:
import rozha.process as process
import rozha.analyze as analyze
sent_tokenized = process.sentTokenizeFile(your_file.txt)
ner_tags = analyze.spacyNer(sent_tokenized, 'en')
print(ner_tags)
Perform word tokenization and remove stopwords from a string, make the text lowercase, and graph the 10 most common words as a bar graph:
import rozha.process as process
import rozha.visualize as visualize
word_tokenized = process.lowerVar(text)
# pass the var, number of words to graph, the height and width of the graph, and your preferred filename
visualize.barFreq(word_tokenized, 10, 400, 400, 'my_graph')
Contributing
Contributions are welcome! The following features are of particular interest:
- Increasing the number of methods in the analyze class
- Increasing the number of methods in the visualization class
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file rozha-1.0.0.tar.gz
.
File metadata
- Download URL: rozha-1.0.0.tar.gz
- Upload date:
- Size: 41.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1cdf81f5959ae11e9c1f2ecd5f51c55ca562311272b4b57e81cbaa04deda64c0 |
|
MD5 | e7a43ef1b91bdc7edea3ce7303b2f07c |
|
BLAKE2b-256 | 0f27aa84f603dc7f400a451c9e0c3193db8e7a32c016d1bfaa5c3fd8fd569e78 |
File details
Details for the file rozha-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: rozha-1.0.0-py3-none-any.whl
- Upload date:
- Size: 35.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e876c90f4ca2ee03da7da60b669ecb71748b0c96ba21fe535c9f36690581ca59 |
|
MD5 | cc9a457c899e2dd0180210d590c85fab |
|
BLAKE2b-256 | 42df3888cd7ec892c478079e3c1764ddd5bda4c48ad4f47699d6ded897194b35 |