Skip to main content

NLP, before and after spaCy

Project description

textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spacy library. With the fundamentals — tokenization, part-of-speech tagging, dependency parsing, etc. — delegated to another library, textacy focuses on the tasks that come before and follow after.

build status current release version pypi version conda version

Features

  • Provide a convenient entry point and interface to one or many documents, with the core processing delegated to spaCy

  • Stream text, json, csv, spaCy binary, and other data to and from disk

  • Download and explore a variety of included datasets with both text content and metadata, from Congressional speeches to historical literature to Reddit comments

  • Clean and normalize raw text, before analyzing it

  • Access and filter basic linguistic elements, such as words, ngrams, and noun chunks; extract named entities, acronyms and their definitions, and key terms

  • Flexibly tokenize and vectorize documents and corpora, then train, interpret, and visualize topic models using LSA, LDA, or NMF methods

  • Compare strings, sets, and documents by a variety of similarity metrics

  • Calculate common text statistics, including Flesch-Kincaid Grade Level, SMOG Index, and multilingual Flesch Reading Ease

and more!

Maintainer

Howdy, y’all. 👋

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textacy-0.6.2.tar.gz (175.6 kB view details)

Uploaded Source

Built Distribution

textacy-0.6.2-py2.py3-none-any.whl (142.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file textacy-0.6.2.tar.gz.

File metadata

  • Download URL: textacy-0.6.2.tar.gz
  • Upload date:
  • Size: 175.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for textacy-0.6.2.tar.gz
Algorithm Hash digest
SHA256 6019f32719c0661f41fa93c2fdd9714504d443119bf4f6426ee690bdda90835b
MD5 87ee057f6566abda4b2c8ea816de7dca
BLAKE2b-256 737a739dce42b05191e283f329b2be11daa58262228851785b3011dd187587d6

See more details on using hashes here.

File details

Details for the file textacy-0.6.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for textacy-0.6.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 8b9abb1a41eb72e634117bd4936a10de7db7d65cf6208f3387c2bc94678e038c
MD5 48a7505ee6bad0f23531389d09c8ae9d
BLAKE2b-256 f71377612f4393d9c8a55e53924f13b2cf8b835cbf4a5e69e288613ed2de9eca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page