Simple package for generating ngrams and bag of words representation from text.
Project description
A simple package designed to be used for demonstrating basic Natural Language Processing (NLP) feature engineering in Python.
## More Info:
### Practice Dataset
[Stack Exchange Data Dump](https://archive.org/details/stackexchange)
### Text Encoding
[The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky](http://www.joelonsoftware.com/articles/Unicode.html)
#### Packages
[`chardet`](https://pypi.python.org/pypi/chardet) - Universal encoding detector for Python 2 and 3
[`cchardet`](https://pypi.python.org/pypi/cchardet/1.0.0) - Universal encoding detector. This library is faster than chardet
[`ftfy`](http://ftfy.readthedocs.org/en/latest/#) - fixes text for you
[`unidecode`](https://pypi.python.org/pypi/Unidecode) - ASCII transliterations of Unicode text
### Natural Language Processing
[Care and Feeding of Topic Models: Problems, Diagnostics, and Improvementes](http://www.people.fas.harvard.edu/~airoldi/pub/books/b02.AiroldiBleiEroshevaFienberg2014HandbookMMM/Ch12_MMM2014.pdf)
### Functional Programing in Python
[Functional programming in Python Examine the functional aspects of Python: which options work well and which ones you should avoid By David Mertz](https://www.oreilly.com/ideas/functional-programming-in-python)
#### Packages
[`toolz`](http://toolz.readthedocs.org/en/latest/) - Toolz provides a set of utility functions for iterators, functions, and dictionaries.
[`functools`](https://docs.python.org/2/library/functools.html#module-functools) - Higher-order functions and operations on callable objects.
[`itertools`](https://docs.python.org/2/library/itertools.html#module-itertools) - Functions creating iterators for efficient looping.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for text2math-0.0.8.dev1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae4872dfb088ed5db1d7df9ee91d91d590da8c7092aba9ed396adca1f09f4bd1 |
|
MD5 | cdf7cfe89384ca5969225ad1bfff78fd |
|
BLAKE2b-256 | 846137783506f7e983e8e47d28efb53ac8e0d47bd1a485727c56db98ca682fc2 |