Skip to main content

NLP, before and after spaCy

Project description

textacy: NLP, before and after spaCy

textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library. With the fundamentals --- tokenization, part-of-speech tagging, dependency parsing, etc. --- delegated to another library, textacy focuses primarily on the tasks that come before and follow after.

build status current release version pypi version conda version

features

  • Access spaCy through convenient methods for working with one or many documents and extend its functionality through custom extensions and automatic language identification for applying the right spaCy pipeline for the text
  • Download datasets with both text content and metadata, from Congressional speeches to historical literature to Reddit comments
  • Easily stream data to and from disk in many common formats
  • Clean, normalize, and explore raw text — before processing it with spaCy
  • Flexibly extract words, n-grams, noun chunks, entities, acronyms, key terms, and other elements of interest from processed documents
  • Compare strings, sets, and documents by a variety of similarity metrics
  • Tokenize and vectorize documents then train, interpret, and visualize topic models
  • Compute a variety of text readability statistics, including Flesch-Kincaid grade level, SMOG index, and multi-lingual Flesch Reading Ease

... and more!

links

maintainer

Howdy, y'all. 👋

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textacy-0.10.1.tar.gz (266.6 kB view details)

Uploaded Source

Built Distribution

textacy-0.10.1-py3-none-any.whl (183.1 kB view details)

Uploaded Python 3

File details

Details for the file textacy-0.10.1.tar.gz.

File metadata

  • Download URL: textacy-0.10.1.tar.gz
  • Upload date:
  • Size: 266.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for textacy-0.10.1.tar.gz
Algorithm Hash digest
SHA256 ff72adc6dbb85db6981324e226fff77830da57d7fe7e4adb2cafd9dc2a8bfa7d
MD5 6bb09896ca6f3e2ff537a2691b03c969
BLAKE2b-256 8c0c3958394631f55f5c9bca737d9f1cd39d07681a175828d4dcc9ad2ab6329a

See more details on using hashes here.

File details

Details for the file textacy-0.10.1-py3-none-any.whl.

File metadata

  • Download URL: textacy-0.10.1-py3-none-any.whl
  • Upload date:
  • Size: 183.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5

File hashes

Hashes for textacy-0.10.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d9af47bc308ebf1e4c51b2646a6eca3906ca94d1d3862270271efbbb4c7ae4fb
MD5 2f43659a7179f059ae8bbec4f0d1b507
BLAKE2b-256 6599054efc5dea92c84a850639c490541de6cba29bc148debc3c73848c5e64c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page