A preprocessing libray for text in spanish
Project description
A package for preprocessing text in spanish
Preln is a Python package that speeds up development and optimizes the performance of applications that require adequate data processing in the field of NLP (Natural Language Processing). This library takes into account the special characteristics of data written in Spanish. It makes data suitable and ready to use for complex applications like training machine-learning models, extracting content from social media or develop powerful tools to automate language correction, lemmatization, stemming within manny others.
📃 Last version v0.5.1-alpha out now! 📃
💬 Contribution & Questions
Contribution & Questions Type | Platforms |
---|---|
🐞 Bug Reports | [GitHub Issue Tracker] |
📦 Feature Requests & Ideas | [GitHub Discussions] |
🛠️ Usage Questions & Discusions | [GitHub Discussions] |
💼 Features
- Apply and combine general basic operations to pre-process text in Spanish
- Establish direct connection with file paths, databases… for easy reading and writing data
- Simple implementation, optimized and ready to apply configuration files
- Autocorrect function to improve data quality
- Methods for privacy control, replacing or removing personal data from the dataset
- Support for spanish and english languages
💾 Install Preln
To start using Preln use the next command:
pip install preln
Note: you might have to add this command as a “code” line in order to use Preln on a Python notebook.
The main object class of the package is called Preprocessing
and it contains all the principal functions of the package. We will be importing this class and creating and object in order to use it’s methods:
from Preln.preprocessing import Preprocessing
preprocessor = Preprocessing(date=False, date_format=None, accents=False, lowercasing=True,
privacy=True, privacy_format="multi:replace", correction=True, media=True,
media_format="mention:delete", numbers=False, punctuation=True,
stopwords=True, tokenizer=True, debug=False)
🔧 Example of use
In this basic example, you can check how to use the package in order to process a simple piece of text.
sample_text = "¡Hola @usuario!, mi nombre es Preln, me han creado Adrián y Raúl. Revisa mi documentación en https://www.preln.org"
test = preprocessor.pipeline(sample_text)
print(test) # ['MENTION', 'nombre', 'ORG', 'creado', 'PERSON', 'PERSON', 'revisa', 'documentación', 'URL']
Note: The pipeline method has it´s parameters (which toggle the core methods) setted by default. It’s interesting to change them based on each text we want to process.
💳 License
Preln is licensed under MIT License.
🗃️ Shields
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file preln-0.5.5.tar.gz
.
File metadata
- Download URL: preln-0.5.5.tar.gz
- Upload date:
- Size: 4.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d9bb9f5574dd4f39082cf37cbe5dad11ff6f90e7503d97e0308d87a447ab28f1 |
|
MD5 | ed3b86bc0a4dbf55f73ef331ecec314c |
|
BLAKE2b-256 | b827301b0e54f5a0b9d1c0f710ab0f40acc6bd699ec237f22a2ef5fcbad735f3 |
File details
Details for the file preln-0.5.5-py3-none-any.whl
.
File metadata
- Download URL: preln-0.5.5-py3-none-any.whl
- Upload date:
- Size: 4.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f359991df961975b2fe7ca68534abf46b2d0e52edd1bb536eed999e93248650 |
|
MD5 | 65f9d8692864eec50f70706615d03395 |
|
BLAKE2b-256 | 32d9f02671b42d370630c1878b9531edcea552c0d7e7302012b4964e81d77272 |