Skip to main content

A preprocessing libray for text in spanish

Project description

logo

A package for preprocessing text in spanish


Preln is a Python package that speeds up development and optimizes the performance of applications that require adequate data processing in the field of NLP (Natural Language Processing). This library takes into account the special characteristics of data written in Spanish. It makes data suitable and ready to use for complex applications like training machine-learning models, extracting content from social media or develop powerful tools to automate language correction, lemmatization, stemming within manny others.

📃​ Last version v0.5.1-alpha out now! 📃​

💬​ Contribution & Questions

Contribution & Questions Type Platforms
🐞​​ Bug Reports [GitHub Issue Tracker]
📦​ Feature Requests & Ideas [GitHub Discussions]
🛠️​ Usage Questions & Discusions [GitHub Discussions]

💼​ Features

  • Apply and combine general basic operations to pre-process text in Spanish
  • Establish direct connection with file paths, databases… for easy reading and writing data
  • Simple implementation, optimized and ready to apply configuration files
  • Autocorrect function to improve data quality
  • Methods for privacy control, replacing or removing personal data from the dataset
  • Support for spanish and english languages

​💾​ Install Preln

To start using Preln use the next command:

pip install preln

Note: you might have to add this command as a “code” line in order to use Preln on a Python notebook.

The main object class of the package is called Preprocessing and it contains all the principal functions of the package. We will be importing this class and creating and object in order to use it’s methods:

from Preln.preprocessing import Preprocessing

preprocessor = Preprocessing(date=False, date_format=None, accents=False, lowercasing=True,   
               privacy=True, privacy_format="multi:replace", correction=True, media=True, 
               media_format="mention:delete", numbers=False, punctuation=True, 
               stopwords=True, tokenizer=True, debug=False)

🔧​ Example of use

In this basic example, you can check how to use the package in order to process a simple piece of text.

sample_text = "¡Hola @usuario!, mi nombre es Preln, me han creado Adrián y Raúl. Revisa mi documentación en https://www.preln.org"

test = preprocessor.pipeline(sample_text)

print(test) # ['MENTION', 'nombre', 'ORG', 'creado', 'PERSON', 'PERSON', 'revisa', 'documentación', 'URL']

Note: The pipeline method has it´s parameters (which toggle the core methods) setted by default. It’s interesting to change them based on each text we want to process.

💳​ License

Preln is licensed under MIT License.

🗃️ Shields

PyPI downloads code_format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

preln-0.5.5.tar.gz (4.4 MB view details)

Uploaded Source

Built Distribution

preln-0.5.5-py3-none-any.whl (4.4 MB view details)

Uploaded Python 3

File details

Details for the file preln-0.5.5.tar.gz.

File metadata

  • Download URL: preln-0.5.5.tar.gz
  • Upload date:
  • Size: 4.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for preln-0.5.5.tar.gz
Algorithm Hash digest
SHA256 d9bb9f5574dd4f39082cf37cbe5dad11ff6f90e7503d97e0308d87a447ab28f1
MD5 ed3b86bc0a4dbf55f73ef331ecec314c
BLAKE2b-256 b827301b0e54f5a0b9d1c0f710ab0f40acc6bd699ec237f22a2ef5fcbad735f3

See more details on using hashes here.

File details

Details for the file preln-0.5.5-py3-none-any.whl.

File metadata

  • Download URL: preln-0.5.5-py3-none-any.whl
  • Upload date:
  • Size: 4.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for preln-0.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5f359991df961975b2fe7ca68534abf46b2d0e52edd1bb536eed999e93248650
MD5 65f9d8692864eec50f70706615d03395
BLAKE2b-256 32d9f02671b42d370630c1878b9531edcea552c0d7e7302012b4964e81d77272

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page