Data cleaning made easy with swachhdata
Project description
Swachhdata
Swachhdata is an open-source Python package that offers simple and efficient tools for cleaning and transforming text data. It aims to provide accessibility to everyone and encourages reusability in various contexts. With Swachhdata, you can easily clean and preprocess your data using a collection of functions and build pipelines to streamline your data processing tasks.
Key Features
-
Data Cleaning: Swachhdata provides a comprehensive set of functions to clean and sanitize your text data. Whether you need to remove stopwords, perform lemmatization, or do tokenisation, Swachhdata has you covered.
-
Flexible Input: Swachhdata supports various data types, including strings, lists of strings, Pandas DataFrames, Pandas Series, and NumPy arrays. You can seamlessly input your data into the functions or pipelines without worrying about the format.
-
Pipelines: You can create data processing pipelines by chaining multiple functions together. This allows you to perform a series of transformations on your data with a single command, making your workflow more efficient.
-
Automatic Data Type Detection: Swachhdata intelligently detects the data type of your input, allowing you to use appropriate cleaning methods automatically. This feature eliminates the need for manual conversions and saves you valuable time.
-
Multiple Backend Engines: Swachhdata provides convenient wrapper functions for performing tasks such as lemmatization and stemming on your text data. These functions allow you to choose the background engine between NLTK, SpaCy, and Gensim, giving you flexibility in selecting the most suitable option for your specific requirements.
-
Open Source and Commercially Usable: Swachhdata is released under the MPL-2.0 license, making it open source and commercially usable. You can freely use, modify, and distribute the package in your projects, whether they are personal, academic, or commercial.
Installation
You can install swachhdata using pip:
pip install swachhdata
Usage
To use Swachhdata, import the package in your Python script or Jupyter Notebook:
import swachhdata.text as sdt
Once imported, you can start utilizing the functions and pipelines provided by Swachhdata to clean and transform your data. Here's an example of how you can build pipeline to clean text data:
pipeline = sdt.htmlRecast() + \
sdt.EscapeSequencesRecast() + \
sdt.MentionsRecast(process='remove') + \
sdt.ContractionsRecast() + \
sdt.CaseRecast(process='lower') + \
sdt.EmojiRecast(process='replace', space_out=True) + \
sdt.HashtagsRecast(process='remove') + \
sdt.ShortWordsRecast(min_length=3) + \
sdt.StopWordsRecast(package='nltk') + \
sdt.NumbersRecast(process='replace', seperator=',') + \
sdt.AlphabetRecast(process='all') + \
sdt.PunctuationsRecast() + \
sdt.LemmatizationRecast()
pipeline.setup(text)
text = pipeline.recast()
For more detailed examples and documentation, please refer to the Documentation.
Contributing
Swachhdata welcomes contributions from the open-source community. If you encounter any issues, have ideas for improvements, or would like to add new features, please submit a pull request on the GitHub repository.
Before submitting a pull request, please ensure that your code adheres to the project's coding conventions and is thoroughly tested.
License
Swachhdata is released under the MPL-2.0 license. For more information, please refer to the LICENSE file.
Contact
If you have any questions, suggestions, or feedback, you can reach out to the Swachhdata team by opening an issue on the GitHub repository.
Thank you for choosing Swachhdata! We hope you find it helpful in cleaning and transforming your data.
Documentation-
- https://Swachhdata.readthedocs.io/en/latest/ (Update coming soon!)
- Examples
Author-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file swachhdata-2.0.2-py3-none-any.whl
.
File metadata
- Download URL: swachhdata-2.0.2-py3-none-any.whl
- Upload date:
- Size: 25.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0960da4c32ddb003789cc5f5ac45120e4307b0f88f4050c4afcd388f14c093b8 |
|
MD5 | 5593e63be2d625da2cb1ea947a41bb1b |
|
BLAKE2b-256 | 828a464da78d66233fd27df66c7f0511a105748d2db497ab9021b96a27d4a97a |