A package for text processing

These details have not been verified by PyPI

Project description

Wordwright

A package for text processing

In today's world, text is omnipresent and serves as more than just a form of communication. From the briefest tweets to in-depth blog posts, from academic papers to business emails, our digital world is full of textual content. The ability to read, analyze, and derive meaning from the written content is crucial. This is where our text analysis package wordwright enters the picture.

Package Summary

This Python package wordwright focuses on text analysis and processing. It offers a range of functions, from basic text cleaning to more complex analyses such as language detection, word and sentence counting, word frequency summarizing, and keyword searching. This functionality is particularly useful in fields like data analysis, natural language processing, and anywhere textual data needs to be understood or transformed. Functions are designed to be self-explanatory, which is especially beneficial for those new to programming or text processing. Quickstart guide could be found here.

Contributors

Yi Han (@yhan178), Yingzi Jin (@jinyz8888), Yi Yan (@carrieyanyi), Hongyang Zhang (@alexzhang0825)

Installation

Install from PyPI

Run this command to install the package

$ pip install wordwright

Install from GitHub

If the installation is unsuccessful, please consider the following process. Before proceeding with the installation, ensure you have Miniconda/Anaconda installed on your system. These tools provide support for creating and managing Conda environments.

Step 1: Clone the Repository

Start by cloning the repository to your local machine. Open your terminal and run the following command:

$ git clone git@github.com:UBC-MDS/wordwright.git

Navigate to the directory of the cloned repository.

Step 2: Create and Activate the Conda Environment

Create a new Conda environment using the environment.yaml file provided in this repository. This file contains all the necessary dependencies, including both Python and Poetry versions.

To create the environment, open your terminal and navigate to the directory where the environment.yaml file is located. Then, run the following command:

$ conda env create -f environment.yaml

$ conda activate wordwright

Step 3: Install the Package Using Poetry

With the Conda environment activated, you can now use Poetry to install the package. Run the following command to install the package using Poetry:

$ poetry install

This command reads the pyproject.toml file in your project (if present) and installs the dependencies listed there.

Running the tests

Navigate to the project root directory and use the following command in terminal to test the functions defined in the projects. Tests are stored in here.

$ pytest tests/*

Troubleshooting

Environment Creation Issues: If you encounter problems while creating the Conda environment, ensure that the environment.yaml file is in the correct directory and that you have the correct version of Conda installed.

Example Usage

To use the wordwright package, you can import and call its functions in your Python environment. Here is an example:

>>> from wordwright.preprocessing import clean_text
>>> from wordwright.word_frequency import frequent_words
>>> from wordwright.count_keywords import count_keywords

>>> clean_text("It's a sunny day. ,Let's GO!")
"it's a sunny day let's go"

>>> text = "The quick brown fox jumps over the lazy dog. The fox was very quick."
>>> stopwords = ["the", "over", "was", "very"]
>>> frequent_words(text, stopwords)
Counter({'quick': 2, 'fox': 2, 'brown': 1, 'jumps': 1, 'lazy': 1, 'dog': 1})

>>> count_keywords("I like cheese.", ["cheese"])
{'cheese': 1}

Functions

load_text(file_path): Loads and returns the content of a text file. Required input is file_path, which specifies the path to the file.
clean_text(text): Cleans a text string by removing punctuation, converting to lowercase, and removing common stopwords. Required input is text, which is the string to be cleaned.
count_keywords(text, keywords): Counts the occurrences of specified keywords in the text. Required inputs are text and keywords. After giving a list of keywords, this function return the occurrence of each selected word.
count_sentences(text, punctuation): Count the number of sentences in the text. The number of sentences is counted based on specified delimiters. Required inputs are text and punctuation.
language_detection(text): Detects if the text is in English or not. Required input is text, which is the text to be checked for language.
frequent_words(text, number, stopwards): Analyzes a given text to find and return the most frequent words, excluding specified stopwords. Required inputs are text, number, and stopwards, which are the cleaned text to be analyzed, the number of most frequent words to return, and a list of words to be excluded from the analysis.

`wordwright` Use in Python Ecosystem

While there are other packages that offer similar functions, such as Natural Language Toolkit (Loper & Bird, 2002) and TextBlob (Loria, 2018). wordwright distinguishes itself by its simplicity and focus on the most essential text processing features. It is designed for ease of use, making it an excellent choice for those who have basic programming knowledge.

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

wordwright was created by Yi Han, Yingzi Jin, Yi Yan, Hongyang Zhang. It is licensed under the terms of the MIT license.

Credits

wordwright was created with cookiecutter and the py-pkgs-cookiecutter template.

Reference

Loper, E., & Bird, S. (2002). Nltk: The natural language toolkit. arXiv preprint cs/0205028.

Loria, S. (2018). textblob Documentation. Release 0.15, 2(8), 269.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

3.0.1

Feb 2, 2024

3.0.0

Feb 2, 2024

2.1.4

Feb 2, 2024

2.1.3

Feb 2, 2024

2.1.2

Feb 2, 2024

2.1.1

Jan 30, 2024

0.1.2

Jan 29, 2024

0.1.1

Jan 29, 2024

0.1.0

Jan 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wordwright-3.0.1.tar.gz (6.9 kB view details)

Uploaded Feb 2, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

wordwright-3.0.1-py3-none-any.whl (8.8 kB view details)

Uploaded Feb 2, 2024 Python 3

File details

Details for the file wordwright-3.0.1.tar.gz.

File metadata

Download URL: wordwright-3.0.1.tar.gz
Upload date: Feb 2, 2024
Size: 6.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for wordwright-3.0.1.tar.gz
Algorithm	Hash digest
SHA256	`ba508e02667ad48d342926478fe151ca6c4ec4b9d2815c5314abc429c53e323b`
MD5	`b57dc217c90eea7b86cf0d9e8c6b773c`
BLAKE2b-256	`d55971a75f85b067a7edb57bd465c31b1eda61a78717e09a8e727c9e81e20f7b`

See more details on using hashes here.

File details

Details for the file wordwright-3.0.1-py3-none-any.whl.

File metadata

Download URL: wordwright-3.0.1-py3-none-any.whl
Upload date: Feb 2, 2024
Size: 8.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for wordwright-3.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9fe58dcc5fba07f57632cd115cfc77a0582860d47af6fd4e66455beb80a23e85`
MD5	`658148e36b23ecba9f7b2ae8f58b8810`
BLAKE2b-256	`7c63cff6696c41feeda63974ef4339994ffc1f23e2d8f2574ad1b6ab81f0685a`

See more details on using hashes here.

wordwright 3.0.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Wordwright

Package Summary

Contributors

Installation

Install from PyPI

Install from GitHub

Step 1: Clone the Repository

Step 2: Create and Activate the Conda Environment

Step 3: Install the Package Using Poetry

Running the tests

Troubleshooting

Example Usage

Functions

wordwright Use in Python Ecosystem

Contributing

License

Credits

Reference

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`wordwright` Use in Python Ecosystem