Skip to main content

A simple CLI tool to decapitalize spanish strings.

Project description

Decapitalize spanish words.

This utility is only for spanish words and contains a spanish dictionary of words with dyacritics.

Use Case.

This tool is designed to restore the correct spelling of names and surnames collected in the official systems of many Spanish-speaking countries. Traditionally, words are stored in capital letters and without accents or other non-ASCII characters. Since this transformation is destructive, it is not easy to obtain the correct spelling of these records.

Description

decapitaliza uses a dictionary of terms with Spanish spelling and a simple text analysis mechanism adapted to the processing of proper names and surnames. It capitalizes words according to common rules for first and last names. "decapitaliza" includes a command line tool that allows you to perform the following functions:

  • Generate custom dictionaries: import correctly spelled terms from a CSV with correct words. If no external dictionary is specified, a pre-generated dictionary is used. The included dictionary can be extended with additional words.
  • Transforms the words of a text (separated by spaces).
  • Processes a CSV by transforming the indicated columns.

Caso de uso.

Esta herramienta está diseñada para restaurar la ortografía correcta de los nombres y apellidos recogidos en los sistemas oficiales de muchos países de habla hispana. Tradicionalmente, las palabras se guardan en mayúsculas y sin tildes ni otros caracteres no ASCII. Como esta transformación es destructiva, no es sencillo obtener la correcta ortografía de esos registros.

Funcionamiento

decapitaliza utiliza un diccionariode términos con ortografia del español y un mecanismo simple de análisis de textos adaptado al procesado de nombres propios y apellidos. Capitaliza las palabras de acuerdo a las reglas comunes para nombres y apellidos. "decapitaliza" incluye una herramienta para línea de comandos que permite realizar las siguientes funciones:

  • Genera diccionarios a la medida: importa términos con ortografía correcta a partir de un CSV con palabras correctas. Si no se indica un diccionario externo se utiliza un diccionario pregenerado. Se puede ampliar el diccionario incluido con palabras adicionales.
  • Transforma las palabras de un texto (separadas por espacios)
  • Procesa un CSV transformando las columnas que se indiquen.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decapitaliza-0.0.3.tar.gz (1.6 MB view hashes)

Uploaded Source

Built Distribution

decapitaliza-0.0.3-py2.py3-none-any.whl (1.6 MB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page