Skip to main content

A library to work with text data

Project description

datawords

PyPI - Version PyPI - Python Version readthedocs


This is a library oriented to common and uncommon NLP tasks.

Datawords emerge after two years of solving different projects that required NLP techniques like training and saving Word2Vec (Gensim) models, finding entities on text (Spacy), ranking texts (scikit-network), indexing it (Spotify Annoy), translating it (Hugging Face).

Then to use those libraries some pre-processing, post-processing tasks and transformations were also required. For this reasons, datawords exists.

Sometimes it’s very opinated (Indexing happens over text, and not over vectors besides Annoy allows it), and sometimes gives you freedom and provide you with helper classes and functions to use freely.

Another way to see this library is as an agreggator of all that excellent libraries mentioned before.

In a nutshell, Datawords let’s you:

  • Train Word2Vec models (Gensim)
  • Build Indexes for texts (Annoy, SQLite)
  • Translate texts (Transformers)
  • Rank texts (PageRank)

Table of Contents

Installation

pip install datawords

To use transformes from HuggingFace please do:

pip install datawords[transformers]

Quickstart

deepnlp:

from datawords.deepnlp import translators
mn = translators.build_model_name("es", "en")
rsp = transform_mp("es", "en", model_path=fp, texts=["hola mundo", "adios mundo", "notias eran las de antes", "Messi es un dios para muchas personas"])

License

datawords is distributed under the terms of the MPL-2.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datawords-0.7.4.tar.gz (56.1 kB view details)

Uploaded Source

Built Distribution

datawords-0.7.4-py3-none-any.whl (36.7 kB view details)

Uploaded Python 3

File details

Details for the file datawords-0.7.4.tar.gz.

File metadata

  • Download URL: datawords-0.7.4.tar.gz
  • Upload date:
  • Size: 56.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.23.0

File hashes

Hashes for datawords-0.7.4.tar.gz
Algorithm Hash digest
SHA256 5611c4d8b1728037ad9ce9b944ab204abdcf948e63a4cbf427a5aae1de717624
MD5 64fbb02c6e84fd9afd1b6a33ff684245
BLAKE2b-256 1b34348900699491f43aadbcab62968d19b052747d50720aeb00f28962dfe55b

See more details on using hashes here.

File details

Details for the file datawords-0.7.4-py3-none-any.whl.

File metadata

  • Download URL: datawords-0.7.4-py3-none-any.whl
  • Upload date:
  • Size: 36.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.23.0

File hashes

Hashes for datawords-0.7.4-py3-none-any.whl
Algorithm Hash digest
SHA256 580c7a5ad3d65ad1d2c574ee17acaf7ed39680d746ef23937754ea8097624b70
MD5 cca69f9ed573dd59e26cb2879cd66623
BLAKE2b-256 2ae0ab518224b2d7a75959cad0259410ab878186df866882d3afd84c70528e46

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page