Skip to main content

An Python Library for training and evaluating on Incremental Word Embedding.

Project description

RiverText

RiverTex is an open-source library for modeling and training different incremental word vector architectures proposed by the state-of-the-art.

It seeks to standardize many existing incremental word vector algorithms into a unified framework to provide a standardized interface and facilitate the development of new methods.

RiverTex provides two training paradigms:

  • learn_one, which trains one instance at a time;

  • and learn_many, which trains a mini-batch of instances at a time.

This allows for more efficient training of text representation models with text data streams.

RiverText also provides an interface similar to the river package, making it easy for developers to use the library to quickly and easily train text representation models.

The official documentation can be found at this link.

Installation

Requirements

These package will be installed along with the package, in case these have not already been installed:

  1. nltk
  2. numpy
  3. river
  4. scikit_learn
  5. scipy
  6. torch
  7. tqdm

Contributing

Development Requirements

Testing

All unit tests are in the rivertext/tests folder. It uses pytest as a framework to run them.

To run the test, execute:

pytest tests

To check the coverage, run:

pytest tests --cov-report xml:cov.xml --cov rivertext

And then:

coverage report -m

Build the documentation

The documentation is created using mkdocs and mkdocs-material. It can be found in the docs folder at the root of the project. First, you need to install:

pip install mkdocs
pip install "mkdocstrings[python]"
pip install mkdocs-material

Then, to compile the documentation, run:

mkdocs build
mkdocs serve

Changelog

References

@article{montiel2021river,
  title={River: machine learning for streaming data in Python},
  author={Montiel, Jacob and Halford, Max and Mastelini, Saulo Martiello and Bolmier, Geoffrey and Sourty,
    Raphael and Vaysse, Robin and Zouitine, Adil and Gomes, Heitor Murilo and Read, Jesse and Abdessalem,
    Talel and others},
  year={2021}
}

@article{bravo2022incremental,
  title={Incremental Word Vectors for Time-Evolving Sentiment Lexicon Induction},
  author={Bravo-Marquez, Felipe and Khanchandani, Arun and Pfahringer, Bernhard},
  journal={Cognitive Computation},
  volume={14},
  number={1},
  pages={425--441},
  year={2022},
  publisher={Springer}
}

@article{kaji2017incremental,
  title={Incremental skip-gram model with negative sampling},
  author={Kaji, Nobuhiro and Kobayashi, Hayato},
  journal={arXiv preprint arXiv:1704.03956},
  year={2017}
}

Team

Contact

Please write to gabrieliturrab at ug.chile.cl for inquiries about the software. You are also welcome to do a pull request or publish an issue in the RiverText repository on Github.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rivertext-0.0.2.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

rivertext-0.0.2-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file rivertext-0.0.2.tar.gz.

File metadata

  • Download URL: rivertext-0.0.2.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for rivertext-0.0.2.tar.gz
Algorithm Hash digest
SHA256 022936c8fb51708898af67892c27e21881ba1ad2f9518ed023c674b5841821dd
MD5 29c1d0ce56fd2ed9262dec4f70177da8
BLAKE2b-256 92f5e8cf5c958db390059e02fc46d5c404caa430486e7e9d3ccec9a0954917d1

See more details on using hashes here.

File details

Details for the file rivertext-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: rivertext-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.8

File hashes

Hashes for rivertext-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5a7636a95e70a982f6848333033de0551239463dd53b77f11fc3066bd43fc980
MD5 95cedc110b875571bade42d184d44226
BLAKE2b-256 360566c4896f62728a49820ad909132c776a7b68da382ee8ce418e5e3b203b1b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page