Skip to main content

Support for spaCy models trained on DocuScope and the CLAWS7 tagset

Project description

DocuScope

docuscospacy: Support for spaCy models trained on DocuScope and the CLAWS7 tagset

PyPI Version Downloads from PyPI Documentation Status Citable Zenodo DOI

The docuscospacy package contains a set of functions to facilitate the processing of tagged corpora using:

The documentation for docuscospacy is available on docuscospacy.readthedocs.org and the GitHub code repository is on github.com/browndw/docuscospacy.

Requirements and installation

docuscospacy works with Python 3.8 or newer (tested up to Python 3.10). It also requires spacy >= 3.3.

The recommended way of installing docuscospacy is to:

pip install docuscospacy

Features

Corpus analysis

The docuscospacy package supports the post-tagging generation of:

Outputs can be controlled either by part-of-speech or by DocuScope tag. Thus, can as noun and can as verb, for example, can be disambiguated.

Additionally, tagged multi-token sequences are aggregated for analysis. So, for example, where in spite of is tagged as a token sequence, it is combined into a single token.

Other features

  • KWIC tables that locate a node word in a center column with context columns on either side

Limits

  • the model that this package is designed for has only been trained on English

  • all data must reside in memory, i.e. no streaming of large data from the hard disk (which for example Gensim supports)

License

Code licensed under Apache License 2.0. See LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

docuscospacy-0.2.4.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

docuscospacy-0.2.4-py3-none-any.whl (27.3 kB view details)

Uploaded Python 3

File details

Details for the file docuscospacy-0.2.4.tar.gz.

File metadata

  • Download URL: docuscospacy-0.2.4.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for docuscospacy-0.2.4.tar.gz
Algorithm Hash digest
SHA256 009804a2b6bedb6460955b3860ccc69f3006141d09f29f35ae5a42375e5d2e5c
MD5 555b5222ba4ae5a7c26f2d2cb5021727
BLAKE2b-256 423f250436783195b9da2653b8b5e4f133daf9681d9fd2fbc2ca3613c51bbcac

See more details on using hashes here.

File details

Details for the file docuscospacy-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: docuscospacy-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 27.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for docuscospacy-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 aa75f873e5dcdc723d5dc71f3dcc233a47274f038b0753823ab0692481288e84
MD5 cb680d2189b71c3f23e858c575b26f7a
BLAKE2b-256 3256d95682b20a581a62222268f84c9da9f47b199a7d8e3f6d7b85aa61697aa9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page