Simple package for generating ngrams and bag of words representation from text.
Project description
A simple package designed to be used for demonstrating basic Natural Language Processing (NLP) feature engineering in Python.
## More Info:
### Practice Dataset
[Stack Exchange Data Dump](https://archive.org/details/stackexchange)
### Text Encoding
[The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky](http://www.joelonsoftware.com/articles/Unicode.html)
#### Packages
[`chardet`](https://pypi.python.org/pypi/chardet) - Universal encoding detector for Python 2 and 3
[`cchardet`](https://pypi.python.org/pypi/cchardet/1.0.0) - Universal encoding detector. This library is faster than chardet
[`ftfy`](http://ftfy.readthedocs.org/en/latest/#) - fixes text for you
[`unidecode`](https://pypi.python.org/pypi/Unidecode) - ASCII transliterations of Unicode text
### Natural Language Processing
[Care and Feeding of Topic Models: Problems, Diagnostics, and Improvementes](http://www.people.fas.harvard.edu/~airoldi/pub/books/b02.AiroldiBleiEroshevaFienberg2014HandbookMMM/Ch12_MMM2014.pdf)
### Functional Programing in Python
[Functional programming in Python Examine the functional aspects of Python: which options work well and which ones you should avoid By David Mertz](https://www.oreilly.com/ideas/functional-programming-in-python)
#### Packages
[`toolz`](http://toolz.readthedocs.org/en/latest/) - Toolz provides a set of utility functions for iterators, functions, and dictionaries.
[`functools`](https://docs.python.org/2/library/functools.html#module-functools) - Higher-order functions and operations on callable objects.
[`itertools`](https://docs.python.org/2/library/itertools.html#module-itertools) - Functions creating iterators for efficient looping.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for text2math-0.0.3.dev1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d69eed570a409f7aeb94f2bd427a41f571b0e8e3d225df23a577b7923ac39c2b |
|
MD5 | a0b7ded3e468dfb2959aaa1a400db63c |
|
BLAKE2b-256 | 5f5d92cd839230524a572289ed1e27203949717dc5319d21c0e27a97df27c829 |