NLP, before and after spaCy
textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spacy library. With the fundamentals — tokenization, part-of-speech tagging, dependency parsing, etc. — delegated to another library, textacy focuses on the tasks that come before and follow after.
- Provide a convenient entry point and interface to one or many documents, with the core processing delegated to spaCy
- Stream text, json, csv, spaCy binary, and other data to and from disk
- Download and explore a variety of included datasets with both text content and metadata, from Congressional speeches to historical literature to Reddit comments
- Clean and normalize raw text, before analyzing it
- Access and filter basic linguistic elements, such as words, ngrams, and noun chunks; extract named entities, acronyms and their definitions, and key terms
- Flexibly tokenize and vectorize documents and corpora, then train, interpret, and visualize topic models using LSA, LDA, or NMF methods
- Compare strings, sets, and documents by a variety of similarity metrics
- Calculate common text statistics, including Flesch-Kincaid Grade Level, SMOG Index, and multilingual Flesch Reading Ease
… and more!
Howdy, y’all. 👋
- Burton DeWilde (<email@example.com>)
Release history Release notifications
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size & hash SHA256 hash help||File type||Python version||Upload date|
|textacy-0.6.2-py2.py3-none-any.whl (142.4 kB) Copy SHA256 hash SHA256||Wheel||py2.py3||Jul 19, 2018|
|textacy-0.6.2.tar.gz (175.6 kB) Copy SHA256 hash SHA256||Source||None||Jul 19, 2018|