Higher-level text processing, built on spaCy
Project description
textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spacy library. With the fundamentals — tokenization, part-of-speech tagging, dependency parsing, etc. — delegated to another library, textacy focuses on the tasks that come before and follow after.
Features
- Provide a convenient entry point and interface to one or many documents, with the core processing delegated to spaCy
- Stream text, json, csv, spaCy binary, and other data to and from disk
- Download and explore a variety of included datasets with both text content and metadata, from Congressional speeches to historical literature to Reddit comments
- Clean and normalize raw text, before analyzing it
- Access and filter basic linguistic elements, such as words, ngrams, and noun chunks; extract named entities, acronyms and their definitions, and key terms
- Flexibly tokenize and vectorize documents and corpora, then train, interpret, and visualize topic models using LSA, LDA, or NMF methods
- Compare strings, sets, and documents by a variety of similarity metrics
- Calculate common text statistics, including Flesch-Kincaid Grade Level, SMOG Index, and multilingual Flesch Reading Ease
… and more!
Links
Note: ReadTheDocs builds have been failing for months, so those docs are currently out-of-date. Very sorry. As a (temporary?) workaround, docs for the latest version (v0.6.0) have been published via GitHub Pages:
Project details
Release history Release notifications
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size textacy-0.6.1-py2.py3-none-any.whl (137.9 kB) | File type Wheel | Python version py2.py3 | Upload date | Hashes View hashes |
Filename, size textacy-0.6.1.tar.gz (170.3 kB) | File type Source | Python version None | Upload date | Hashes View hashes |
Close
Hashes for textacy-0.6.1-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 339b8786c3f69fb389575d27df7bb38f8eb4ba3fea82e4cc8d61ccab020293bf |
|
MD5 | ca38521fe184e41110de83c8e67d4405 |
|
BLAKE2-256 | 419f22b9dec63bff5e6ef7fb47b2cd37025087c3995b6ca5467d78160f5b0eb3 |